WO2022166293A1 - 一种目标检测方法和装置 - Google Patents
一种目标检测方法和装置 Download PDFInfo
- Publication number
- WO2022166293A1 WO2022166293A1 PCT/CN2021/130102 CN2021130102W WO2022166293A1 WO 2022166293 A1 WO2022166293 A1 WO 2022166293A1 CN 2021130102 W CN2021130102 W CN 2021130102W WO 2022166293 A1 WO2022166293 A1 WO 2022166293A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- downsampling
- target detection
- yolo
- feature map
- detection
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 223
- 238000005070 sampling Methods 0.000 claims abstract description 63
- 238000000034 method Methods 0.000 claims abstract description 54
- 238000010276 construction Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 5
- 238000013138 pruning Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 17
- 230000007547 defect Effects 0.000 description 9
- 238000004590 computer program Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000000835 fiber Substances 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 238000011895 specific detection Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000012535 impurity Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the present application relates to the technical field of computer vision, and in particular, to a target detection method and device.
- YOLO (English full name You Only Look Once) is a typical single-stage target detection technology, that is, the information such as the position and category of the target is directly returned according to the original image, and it has been developed to the fourth version, namely YOLO-v4.
- Figure 1 shows a schematic diagram of the network structure of YOLO-v4. It can be seen that it contains a downsampling structure composed of multiple downsampling layers, but this setup has some disadvantages. For example, in industrial defect detection scenarios, there are some Defects are still difficult to identify accurately, and there is still room for improvement in the technology.
- the embodiments of the present application provide a target detection method and apparatus, so as to further improve the accuracy of target detection.
- an embodiment of the present application provides a target detection method, including: setting at least one adjustment method for the downsampling structure of the YOLO-v4 backbone network based on the characteristics of the target to be detected; Adjust the down-sampling structure to build a target detection model based on YOLO-v4; input the detection image into the target detection model, extract the down-sampling feature map of the detected image by the target detection model, and obtain the target detection result according to the down-sampling feature map; The size of the sampled feature map is determined according to the adjusted downsampling structure.
- an embodiment of the present application further provides a target detection device, including: an adjustment unit, configured to set at least one adjustment method for the downsampling structure of the YOLO-v4 backbone network based on the characteristics of the target to be detected; It is used to adjust the down-sampling structure of the YOLO-v4 backbone network by using the adjustment method to build a target detection model based on YOLO-v4; the detection unit is used to input the detection image into the target detection model, and the target detection model extracts the detection image.
- the down-sampling feature map is used to obtain the target detection result according to the down-sampling feature map; the size of the down-sampling feature map is determined according to the adjusted down-sampling structure.
- embodiments of the present application further provide an electronic device, including: a processor; and a memory arranged to store computer-executable instructions, the executable instructions, when executed, cause the processor to execute the above target detection method.
- embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and when the one or more programs are executed by an electronic device including multiple application programs, The device performs the object detection method as above.
- the above-mentioned at least one technical solution adopted in the embodiments of the present application can achieve the following beneficial effects: selecting YOLO-v4 to construct a target detection model, and setting an adjustment method for the down-sampling structure based on the characteristics of the target to be detected, so that the target detection model obtained after adjustment
- the size-adjusted down-sampling feature map can be obtained, and on this basis, the target detection can obtain higher accuracy.
- scratches, hair fibers, etc. are imaged as linear and small targets. If the original downsampling structure is used to process the inspection images, multiple downsampling will significantly reduce the detection performance, while The improved target detection model effectively solves this problem.
- Figure 1 shows a schematic diagram of the network structure of YOLO-v4
- Fig. 2 is the feature map size of each down-sampling layer output shown on the basis of the network structure of Fig. 1;
- FIG. 3 shows a schematic flowchart of a target detection method according to an embodiment of the present application
- FIG. 4 is a feature map size output by each downsampling layer shown on the basis of a network structure of a target detection model according to an embodiment of the present application;
- FIG. 5 is a feature map size output by each downsampling layer shown on the basis of a network structure of a target detection model according to another embodiment of the present application;
- Fig. 6 is the feature map size of each downsampling layer output shown on the basis of the network structure of the target detection model according to still another embodiment of the present application;
- Fig. 7 is the feature map size of each downsampling layer output shown on the basis of the network structure of the target detection model according to still another embodiment of the present application;
- FIG. 8 shows a network schematic diagram of a target detection model according to an embodiment of the present application.
- FIG. 9 shows a network diagram of a target detection model according to another embodiment of the present application.
- FIG. 10 shows a schematic structural diagram of a target detection model according to an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of an electronic device in an embodiment of the present application.
- Fig. 2 shows the feature map size output by each downsampling layer based on the network structure shown in Fig. 1 .
- the size of the input image is 416*416 (the unit is pixel, the same below), and it is divided into three RGB channels (that is, 416*416*3 shown in Figure 2, marked in the following layers). If the numbers have the same meaning and will not be explained one by one), the feature map of 416*416*32 is first processed (the corresponding network structure is not shown in Figure 1), and then the 1/2 downsampling layer is obtained.
- 208*208 feature map 104*104 feature map after 1/4 downsampling layer, 52*52 feature map after 1/8 downsampling layer, 26*26 feature map after 1/16 downsampling layer Figure, a 13*13 feature map is obtained after a 1/32 downsampling layer.
- the idea of the prior art is to down-sample as much as possible, while the design idea of the present invention is to reduce down-sampling so as to improve the accuracy.
- FIG. 3 shows a schematic flowchart of a target detection method according to an embodiment of the present application. As shown in Figure 3, the method includes:
- Step S310 based on the characteristics of the target to be detected, set at least one adjustment mode for the downsampling structure of the YOLO-v4 backbone network.
- the target to be detected here can be a vehicle, a defect, etc. various objects that need to be detected, and the "feature” does not refer to the tensor feature obtained by using the neural network, but refers to the appearance features such as slenderness and small size.
- the tensor features obtained by using the neural network are represented by "feature map" in the following text.
- the adjustment method here can be to reduce the downsampling times achieved by the downsampling structure or is the effect.
- Step S320 adjust the down-sampling structure of the YOLO-v4 backbone network using the adjustment method, and construct a target detection model based on YOLO-v4.
- Step S330 the detection image is input into the target detection model, the down-sampling feature map of the detection image is extracted by the target detection model, and the target detection result is obtained according to the down-sampling feature map; the size of the down-sampling feature map is based on the adjusted down-sampling structure. definite.
- the specific detection method of YOLO-v4 will not be changed.
- the detection branch based on the down-sampling layer is used, and the anchor frame group is used through operations such as up-sampling, splicing, and convolution.
- the down-sampling feature map of the target detection result is obtained.
- it may also be considered to add detection branches, and corresponding embodiments will be introduced later.
- the method shown in Figure 3 can improve the accuracy of target detection by adjusting the down-sampling structure. Taking an industrial defect detection scene as an example, scratches, hair fibers, etc. are imaged as linear and small targets. If the original downsampling structure is used to process the detection image, multiple downsampling will significantly reduce the detection performance, and the improved target detection model effectively solves this problem.
- using the adjustment method to adjust the downsampling structure of the YOLO-v4 backbone network includes: adjusting the step size of at least one downsampling layer in the downsampling structure.
- the stride of the 1/8 downsampling layer is 2.
- the stride of the 1/8 downsampling layer is 2.
- the size of the detected image is 416*416 as an example
- the size of the downsampled feature map is 52*52. If the step size is adjusted from 2 to 1, the effect shown in Figure 4 can be obtained, that is, taking the size of the detected image as 416*416 as an example, the size of the down-sampling feature map obtained by the 1/8 down-sampling layer It is still 104*104, which is the same size as the downsampled feature map obtained by the 1/4 downsampling layer (the number of channels changes).
- Example 1 was superior to Comparative Example 1 in multiple indicators. Specifically, it had an advantage of about 6 percentage points in the average detection accuracy (mAP) indicator. It has an advantage of about 1 percentage point on the recall metric and about 4 percentage points on the detection precision metric.
- mAP average detection accuracy
- adjusting the down-sampling structure of the YOLO-v4 backbone network by means of adjustment includes: deleting one or more down-sampling layers in the down-sampling structure.
- Removing the downsampling layer can directly reduce the number of downsampling.
- reducing the number of downsampling also makes the size of the downsampled feature maps relatively large, which in turn increases the training and inference time of the object detection model.
- adjusting the downsampling structure of the YOLO-v4 backbone network by means of adjustment includes: deleting any one of the 1/4 downsampling layer and the 1/32 downsampling layer.
- Figure 5 shows the size of the feature map output by each downsampling layer after deleting the 1/4 downsampling layer;
- Figure 6 shows the size of the feature map output by each downsampling layer after deleting the 1/32 downsampling layer.
- the target detection model corresponding to Figure 5 has 255M bytes, and the target detection model corresponding to Figure 6 is only 73.7M bytes. Save memory space for deploying target detection model devices.
- using the adjustment method to adjust the downsampling structure of the YOLO-v4 backbone network further includes: halving the number of channels of each network structure originally connected after the deleted downsampling layer.
- Figure 7 shows the size of the feature map output by each downsampling layer after deleting the 1/4 downsampling layer and halving the number of channels of each subsequent network structure.
- the target detection model corresponding to Figure 7 is only 64.2M bytes. Compared with the target detection model of Comparative Example 1 (256M bytes), the volume is also reduced, which can save the memory space of the device for deploying the target detection model.
- constructing a target detection model based on YOLO-v4 includes: adding a detection branch based on a specified downsampling layer in the adjusted downsampling structure.
- the backbone network of the original YOLO-v4 has three detection branches, which are respectively connected to the 1/8 downsampling layer, the 1/16 downsampling layer and the 1/32 downsampling layer.
- Fig. 8 shows a network diagram of a target detection model according to an embodiment of the present application.
- Fig. 8 introduces a new The added detection branch, that is, the target detection model has a total of four detection branches.
- the down-sampling feature map obtained by the 1/4 down-sampling layer can be used for target detection to improve the accuracy.
- the scheme of adding detection branches and the aforementioned scheme of reducing downsampling can be used in combination, for example, firstly, the downsampling structure is adjusted, and then the setting method of the detection branch is adjusted based on the adjusted downsampling structure.
- constructing a target detection model based on YOLO-v4 further includes: setting anchor boxes used by each detection branch in the target detection model according to the added detection branch.
- the anchor frame is a reference frame selected during target detection, and the specific usage can be implemented with reference to the prior art.
- the solution of the embodiment proposed in this application can only adjust the number of anchor frames.
- the detection branch derived from the 1/8 downsampling layer uses three sets of anchor boxes with serial numbers 0, 1, and 2;
- the derived detection branch uses three groups of anchor boxes with serial numbers 3, 4, and 5;
- setting the anchor frame used by each detection branch in the target detection model includes: allocating a first preset number of anchor frame groups to each detection branch, a first The preset number is the number of anchor frame groups used by the original YOLO-v4 backbone network, and the number of anchor frame groups to which each detection branch is assigned is not 0; or, the number of anchor frame groups is increased from the first preset number to For the second preset number, the anchor frame groups of the second preset number are equally distributed to each detection branch.
- the 9 groups of anchor boxes used in the original YOLO-v4 backbone network can be reassigned to all the current detection branches.
- Layer derivation uses the anchor box group with sequence number 0; the detection branch derived from the 1/8 downsampling layer uses three sets of anchor box groups with sequence numbers 1, 2, and 3; the detection branch derived from the 1/16 downsampling layer uses Three sets of anchor box groups with serial numbers 4, 5, and 6; the detection branch derived from the 1/32 downsampling layer uses two sets of anchor box groups with serial numbers of 7 and 8.
- each detection branch can use the same number of anchor frame groups.
- FIG. 9 a network diagram of a target detection model according to another embodiment of the present application is shown. , let the newly added detection branch also use three sets of anchor frame groups, and a total of 12 sets of anchor frame groups are used.
- the target detection method further includes: pruning the target detection model.
- Reducing down-sampling and adding detection branches in the backbone network can improve the detection performance of linear or small-volume targets, but it will also bring more computation.
- the target detection model can be pruned.
- the network slimming pruning algorithm can be selected, and the target detection model can be sparsely trained to obtain the sparse ⁇ parameter (provided that the target detection model needs to use a batch normalized BN layer with ⁇ parameter), and then based on the sparse ⁇ parameter parameter to prune the input channel and/or output channel of the convolutional layer.
- FIG. 10 shows a schematic structural diagram of a target detection apparatus according to an embodiment of the present application.
- the target detection apparatus 1000 includes:
- the adjustment unit 1010 is configured to set at least one adjustment mode for the downsampling structure of the YOLO-v4 backbone network based on the characteristics of the target to be detected.
- the target to be detected here can be a vehicle, a defect, etc. various objects that need to be detected, and the "feature” does not refer to the tensor feature obtained by using the neural network, but refers to the appearance features such as slenderness and small size.
- the tensor features obtained by using the neural network are represented by "feature map" in the following text.
- the adjustment method here can be to reduce the downsampling times achieved by the downsampling structure or is the effect.
- the construction unit 1020 is configured to adjust the down-sampling structure of the YOLO-v4 backbone network by using an adjustment method to construct a target detection model based on YOLO-v4.
- the detection unit 1030 is configured to input the detection image into the target detection model, extract the down-sampling feature map of the detection image by the target detection model, and obtain the target detection result according to the down-sampling feature map; the size of the down-sampling feature map is adjusted according to the The downsampling structure is determined.
- the specific detection method of YOLO-v4 will not be changed.
- the detection branch based on the down-sampling layer is used, and the anchor frame group is used through operations such as up-sampling, splicing, and convolution.
- the down-sampling feature map of the target detection result is obtained.
- it may also be considered to add detection branches, and corresponding embodiments will be introduced later.
- the device shown in Figure 10 can improve the accuracy of target detection by adjusting the down-sampling structure. Taking an industrial defect detection scene as an example, scratches, hair fibers, etc. are imaged as linear and small targets. If the original downsampling structure is used to process the detection image, multiple downsampling will significantly reduce the detection performance, and the improved target detection model effectively solves this problem.
- the construction unit 1020 is configured to adjust the step size of at least one downsampling layer in the downsampling structure.
- a construction unit 1020 is configured to delete one or more downsampling layers in the downsampling structure. In some embodiments, in the object detection apparatus, the construction unit 1020 is configured to delete any one of the 1/4 downsampling layer and the 1/32 downsampling layer. In some embodiments, in the target detection apparatus, the construction unit 1020 is configured to halve the number of channels of each network structure originally connected after the deleted downsampling layer. In some embodiments, in the target detection apparatus, the construction unit 1020 is configured to add detection branches based on the specified down-sampling layer in the adjusted down-sampling structure.
- the construction unit 1020 is configured to set anchor boxes used by each detection branch in the target detection model according to the added detection branch. In some embodiments, in the target detection apparatus, the construction unit 1020 is configured to assign a first preset number of anchor frame groups to each detection branch, where the first preset number is the anchor frame used by the original YOLO-v4 backbone network The number of groups, the number of anchor frame groups to which each detection branch is assigned is not 0; or, the number of anchor frame groups is increased from the first preset number to the second preset number, and the second preset number of anchor frames Groups are evenly distributed to each detection branch.
- the target detection apparatus further includes: a pruning unit, configured to perform pruning processing on the target detection model.
- target detection apparatus can implement each step of the target detection method performed by the target detection server provided in the foregoing embodiments, and the relevant explanations about the target detection method are applicable to the target detection apparatus, and will not be repeated here.
- FIG. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory.
- the memory may include memory, such as high-speed random-access memory (Random-Access Memory, RAM), or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
- RAM Random-Access Memory
- non-volatile memory such as at least one disk memory.
- the electronic equipment may also include hardware required for other services.
- the processor, network interface and memory can be connected to each other through an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Component Interconnect Standard) bus. Industry Standard Architecture, extended industry standard structure) bus, etc.
- the bus can be divided into address bus, data bus, control bus and so on. For ease of representation, only one bidirectional arrow is shown in FIG. 11, but it does not mean that there is only one bus or one type of bus.
- the program may include program code, and the program code includes computer operation instructions.
- the memory may include memory and non-volatile memory and provide instructions and data to the processor.
- the processor reads the corresponding computer program from the non-volatile memory into the memory and runs it, forming a target detection device on a logical level.
- the target detection device shown in FIG. 11 does not constitute a limitation on the number of target detection devices in the present application.
- the processor executes the program stored in the memory, and is specifically used to perform the following operations:
- the adjustment method for the downsampling structure of the YOLO-v4 backbone network; use the adjustment method to adjust the downsampling structure of the YOLO-v4 backbone network to build a target detection model based on YOLO-v4;
- the detection image is input into the target detection model, the down-sampling feature map of the detection image is extracted by the target detection model, and the target detection result is obtained according to the down-sampling feature map; the size of the down-sampling feature map is determined according to the adjusted down-sampling structure.
- the above-mentioned method performed by the target detection apparatus disclosed in the embodiment shown in FIG. 1 of the present application may be applied to a processor, or implemented by a processor.
- a processor may be an integrated circuit chip with signal processing capabilities.
- each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
- the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP Digital Signal Processor
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
- the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
- the electronic device can also perform the method performed by the target detection apparatus in FIG. 1 , and implement the functions of the target detection apparatus in the embodiment shown in FIG. 10 , and details are not described herein again in this embodiment of the present application.
- the embodiments of the present application also provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs include instructions, and the instructions are executed by an electronic device including multiple application programs.
- the electronic device can be made to execute the method executed by the target detection apparatus in the embodiment shown in FIG. 1 , and is specifically used to execute:
- the adjustment method for the downsampling structure of the YOLO-v4 backbone network; use the adjustment method to adjust the downsampling structure of the YOLO-v4 backbone network to build a target detection model based on YOLO-v4;
- the detection image is input into the target detection model, the down-sampling feature map of the detection image is extracted by the target detection model, and the target detection result is obtained according to the down-sampling feature map; the size of the down-sampling feature map is determined according to the adjusted down-sampling structure.
- the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
- the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
- a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
- processors CPUs
- input/output interfaces network interfaces
- memory volatile and non-volatile memory
- Memory may include forms of non-persistent memory, random access memory (RAM) and/or non-volatile memory in computer readable media, such as read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
- RAM random access memory
- ROM read only memory
- flash RAM flash memory
- Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
- Information may be computer readable instructions, data structures, modules of programs, or other data.
- Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
- computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Quality & Reliability (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
本申请公开了一种目标检测方法和装置。该方法包括:基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式;利用调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型;将检测图像输入到目标检测模型中,由目标检测模型提取检测图像的降采样特征图,根据降采样特征图得到目标检测结果;降采样特征图的尺寸是根据调整过的降采样结构确定的。
Description
本申请涉及计算机视觉技术领域,尤其涉及一种目标检测方法和装置。
发明背景
YOLO(英文全称为You Only Look Once)是典型的单阶段目标检测技术,即直接根据原始图像回归出目标的位置和类别等信息,目前已经发展到第四个版本,即YOLO-v4。图1示出了YOLO-v4的网络结构示意图,可以看出其包含由多个降采样层组成的降采样结构,但这种设置存在着一些缺点,例如在工业上的缺陷检测场景,有一部分缺陷还是难以准确识别,该技术仍存在改进空间。
需要说明的是,这里的陈述仅提供与本申请有关的背景信息,而不必然地构成现有技术。
发明内容
本申请实施例提供了一种目标检测方法和装置,以进一步提高目标检测的精度。
本申请实施例采用下述技术方案:
第一方面,本申请实施例提供一种目标检测方法,包括:基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式;利用调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型;将检测图像输入到目标检测模型中,由目标检测模型提取检测图像的降采样特征图,根据降采样特征图得到目标检测结果;降采样特征图的尺寸是根据调整过的降采样结构确定的。
第二方面,本申请实施例还提供一种目标检测装置,包括:调整单元,用于基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式;构建单元,用于利用调整方式对YOLO-v4主干网络的降采样结构进行 调整,构建基于YOLO-v4的目标检测模型;检测单元,用于将检测图像输入到目标检测模型中,由目标检测模型提取检测图像的降采样特征图,根据降采样特征图得到目标检测结果;降采样特征图的尺寸是根据调整过的降采样结构确定的。
第三方面,本申请实施例还提供一种电子设备,包括:处理器;以及被安排成存储计算机可执行指令的存储器,可执行指令在被执行时使处理器执行如上的目标检测方法。
第四方面,本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质存储一个或多个程序,一个或多个程序当被包括多个应用程序的电子设备执行时,使得电子设备执行如上的目标检测方法。
本申请实施例采用的上述至少一个技术方案能够达到以下有益效果:选择YOLO-v4构建目标检测模型,并基于待检测目标的特征设置针对降采样结构的调整方式,使得调整后得到的目标检测模型能够得到尺寸调整过的降采样特征图,在此基础上进行目标检测能够得到更高的精度。以工业上的缺陷检测场景为例,划痕、毛纤等成像为线状、体积较小的目标,如果利用原始降采样结构对检测图像进行处理,多次降采样会明显降低检测性能,而改良后的目标检测模型有效解决了这一问题。
附图简要说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1示出了YOLO-v4的网络结构示意图;
图2为在图1的网络结构基础上示出的各降采样层输出的特征图尺寸;
图3示出了根据本申请一个实施例的目标检测方法的流程示意图;
图4为在根据本申请一个实施例的目标检测模型的网络结构基础上示出的各降采样层输出的特征图尺寸;
图5为在根据本申请另一个实施例的目标检测模型的网络结构基础上示出的各降采样层输出的特征图尺寸;
图6为在根据本申请又一个实施例的目标检测模型的网络结构基础上示出的各降采样层输出的特征图尺寸;
图7为在根据本申请再一个实施例的目标检测模型的网络结构基础上示出的各降采样层输出的特征图尺寸;
图8示出了根据本申请一个实施例的目标检测模型的网络示意图;
图9示出了根据本申请另一个实施例的目标检测模型的网络示意图;
图10示出了根据本申请一个实施例的目标检测模型的结构示意图;
图11为本申请实施例中一种电子设备的结构示意图。
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图2在图1示出的网络结构基础上示出了各降采样层输出的特征图尺寸。如图2所示,在输入图像的尺寸为416*416(单位为像素,下同),共分RGB三个通道(即图2中示出的416*416*3,下面各层中标出的数字具有同类含义,不再一一解释)的情况下,先处理为416*416*32的特征图(对应的网络结构在图1中未示出),然后依次经过1/2降采样层得到208*208的特征图、经过1/4降采样层得到104*104的特征图、经过1/8降采样层得到52*52的特征图、经过1/16降采样层得到26*26的特征图、经过1/32降采样层得到13*13的特征图。
原始YOLO-v4这样设计的原因在于,经过多次降采样可以得到尺寸较小的特征图,在较小的特征图上进行目标检测可以大大提高模型的推理速度。
但是发明人发现,这种方式对常见的成像为面状、且体积较大的自然物体 来说,多次降采样基本不会降低检测精度;但是,对于成像为线状、且体积较小的物体(尤其如工业检测中的一些细微毛纤缺陷,细微杂质缺陷)来说,多次降采样会明显降低模型的检测性能。
也就是说,现有技术的思路在于尽可能多的降采样,而本发明的设计思路在于,减少降采样,从而实现精度的提升。
以下结合附图,详细说明本申请各实施例提供的技术方案。
图3示出了根据本申请一个实施例的目标检测方法的流程示意图。如图3所示,该方法包括:
步骤S310,基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式。
这里的待检测目标可以是车辆、缺陷等等各类需要检测的对象,而“特征”并不是指利用神经网络得到的张量特征,而是指细长、体积小等表象特征。为加以区别,后文中以“特征图”来表示利用神经网络得到的张量特征。
如前文所述,对于成像为线状、且体积较小的待检测目标,由于降采样次数过多会降低检测精度,因此此处的调整方式可以是减少降采样结构所实现的降采样次数或者是效果。
步骤S320,利用调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型。
步骤S330,将检测图像输入到目标检测模型中,由目标检测模型提取检测图像的降采样特征图,根据降采样特征图得到目标检测结果;降采样特征图的尺寸是根据调整过的降采样结构确定的。
这里,对YOLO-v4的具体检测方式不做改变,可以参照图1和图2,利用基于降采样层引出的检测分支,通过上采样、拼接、卷积等操作,利用锚框组,基于得到的降采样特征图得到目标检测结果。为了提升检测效果,也可以考虑增加检测分支,相应的实施例会在后面加以介绍。
可见,图3所示的方法,能够通过调整降采样结构来实现目标检测的精度提升,以工业上的缺陷检测场景为例,划痕、毛纤等成像为线状、体积较小的 目标,若利用原始降采样结构对检测图像进行处理,多次降采样会明显降低检测性能,而改良后的目标检测模型有效解决了这一问题。
在一些实施例中,目标检测方法中,利用调整方式对YOLO-v4主干网络的降采样结构进行调整包括:对降采样结构中至少一个降采样层的步长进行调整。
例如,YOLO-v4主干网络中,1/8降采样层的步长(stride)为2,参照图2可知,以检测图像的尺寸为416*416为例,通过1/8降采样层得到的降采样特征图的尺寸为52*52。而如果将该步长由2调整为1,则能够得到图4所示的效果,即以检测图像的尺寸为416*416为例,通过1/8降采样层得到的降采样特征图的尺寸仍为104*104,与通过1/4降采样层得到的降采样特征图的尺寸相同(通道数发生变化)。
以原始YOLO-v4构建的目标检测模型作为对比例1,以将1/8降采样层的步长设置为1,其余不做变动得到的目标检测模型作为实施例1,经过相同的样本集训练后,对实验集中的检测图像进行检测,实验数据表明实施例1在多个指标上均优于对比例1,具体地,在平均检测精度(mAP)指标上具有约6个百分点的优势,在召回率(recall)指标上具有约1个百分点的优势,在检测精度(precision)指标上具有约4个百分点的优势。
在一些实施例中,目标检测方法中,利用调整方式对YOLO-v4主干网络的降采样结构进行调整包括:删除降采样结构中的一个或多个降采样层。
删除降采样层,可以很直接地减少降采样次数。然而,减少降采样的次数也会使得降采样特征图的尺寸相对较大,这又会增加目标检测模型的训练和推理时间。
发明人通过实验,发现了较为均衡的方案。在一些实施例中,目标检测方法中,利用调整方式对YOLO-v4主干网络的降采样结构进行调整包括:删除1/4降采样层和1/32降采样层中的任一个。图5示出了删除1/4降采样层后,各降采样层输出的特征图尺寸;图6示出了删除1/32降采样层后,各降采样层输出的特征图尺寸。
图5对应的目标检测模型有255M字节,图6对应的目标检测模型仅有73.7M 字节,相较于前述对比例1的目标检测模型(256M字节),体积也得到了减少,能够节约部署目标检测模型设备的内存空间。
在一些实施例中,目标检测方法中,利用调整方式对YOLO-v4主干网络的降采样结构进行调整还包括:将原本连接在被删除的降采样层之后的各网络结构的通道数减半。例如,图7示出了删除1/4降采样层,并对其后各网络结构的通道数减半后,各降采样层输出的特征图尺寸。
图7对应的目标检测模型仅有64.2M字节,相较于前述对比例1的目标检测模型(256M字节),体积也得到了减少,能够节约部署目标检测模型设备的内存空间。
在一些实施例中,目标检测方法中,构建基于YOLO-v4的目标检测模型包括:以调整后的降采样结构中的指定降采样层为基础,增加检测分支。
参照图1可知,原始YOLO-v4的主干网络具有三个检测分支,分别接在1/8降采样层、1/16降采样层以及1/32降采样层后。
增加检测分支可以实现在更多的降采样特征图上进行目标检测,因此也可以提检测精度。例如,图8示出了根据本申请一个实施例的目标检测模型的网络示意图,相较于图1示出的原始YOLO-v4的主干网络,图8在1/4降采样层引出了一个新增的检测分支,即该目标检测模型共有四个检测分支,相较于原始YOLO-v4的主干网络,能够利用1/4降采样层得到的降采样特征图进行目标检测,实现精度的提升。
需要说明的是,增加检测分支的方案与前述减少降采样的方案可以结合使用,例如先对降采样结构进行调整,再基于调整后的降采样结构,调整检测分支的设置方式。如图7所示的网络结构,虽然删除了1/4降采样层,但是可以从1/2降采样层引出新增的检测分支;如图5所示的网络结构,虽然删除了1/32降采样层,但可以基于1/2降采样层、1/4降采样层、1/8降采样层和1/16降采样层引出检测分支,等等。
在一些实施例中,目标检测方法中,构建基于YOLO-v4的目标检测模型还包括:根据增加的检测分支,设置目标检测模型中各检测分支所使用的锚框。
锚框是在进行目标检测时选用的参考框,具体使用方式可以参照现有技术实现,本申请提出实施例的方案可以仅调整锚框的数量。
参照图1可知,原始YOLO-v4的主干网络中,由1/8降采样层引出的检测分支使用序号为0、1、2的三组锚框(anchor)组;由1/16降采样层引出的检测分支使用序号为3、4、5的三组锚框组;由1/32降采样层引出的检测分支使用序号为6、7、8的三组锚框组。
由于新增了检测分支,则需要确定新增的检测分支使用哪些锚框。
在一些实施例中,目标检测方法中,根据增加的检测分支,设置目标检测模型中各检测分支所使用的锚框包括:将第一预设数量的锚框组分配给各检测分支,第一预设数量是原始YOLO-v4主干网络所使用的锚框组数量,各检测分支被分配到的锚框组数量均不为0;或者,将锚框组的数量由第一预设数量增加至第二预设数量,将第二预设数量的锚框组平均分配给各检测分支。
这里示出了两种可行的锚框分配方案,可以根据实际需求来选择其一来使用。在一种方案中,可以将原始YOLO-v4的主干网络中使用的9组锚框组重新分配给当前的所有检测分支,例如可以参照图8,新增的检测分支(由1/4降采样层引出)使用序号为0的锚框组;由1/8降采样层引出的检测分支使用序号为1、2、3的三组锚框组;由1/16降采样层引出的检测分支使用序号为4、5、6的三组锚框组;由1/32降采样层引出的检测分支使用序号为7、8的两组锚框组。
或者,在另一种方案中,可以使各检测分支均使用相同数量的锚框组,例如参照图9,示出了根据本申请另一个实施例的目标检测模型的网络示意图,在该实施例中,令新增的检测分支也使用三组锚框组,共使用12组锚框组。
在一些实施例中,目标检测方法还包括:对目标检测模型进行剪枝处理。
减少主干网络中的降采样和增加检测分支,虽然都可以提高对线状或者小体积目标的检测性能,但是也会带来更多的计算量。为了减少上述操作带来的计算量,以及降低网络过拟合的风险,可以对目标检测模型进行剪枝处理。
例如,可以选择network slimming剪枝算法,先对目标检测模型进行稀疏化训练,得到稀疏化的γ参数(前提是目标检测模型需要使用带γ参数的批标准 化BN层),然后基于稀疏化的γ参数,对卷积层的输入通道和/或输出通道进行剪枝处理。
本申请实施例还提供一种目标检测装置,用于实现如上任一实施例提供的目标检测方法。具体地,图10示出了根据本申请一个实施例的目标检测装置的结构示意图。如图10所示,目标检测装置1000包括:
调整单元1010,用于基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式。
这里的待检测目标可以是车辆、缺陷等等各类需要检测的对象,而“特征”并不是指利用神经网络得到的张量特征,而是指细长、体积小等表象特征。为加以区别,后文中以“特征图”来表示利用神经网络得到的张量特征。
如前文所述,对于成像为线状、且体积较小的待检测目标,由于降采样次数过多会降低检测精度,因此此处的调整方式可以是减少降采样结构所实现的降采样次数或者是效果。
构建单元1020,用于利用调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型。
检测单元1030,用于将检测图像输入到目标检测模型中,由目标检测模型提取检测图像的降采样特征图,根据降采样特征图得到目标检测结果;降采样特征图的尺寸是根据调整过的降采样结构确定的。
这里,对YOLO-v4的具体检测方式不做改变,可以参照图1和图2,利用基于降采样层引出的检测分支,通过上采样、拼接、卷积等操作,利用锚框组,基于得到的降采样特征图得到目标检测结果。为了提升检测效果,也可以考虑增加检测分支,相应的实施例会在后面加以介绍。
可见,图10所示的装置,能够通过调整降采样结构来实现目标检测的精度提升,以工业上的缺陷检测场景为例,划痕、毛纤等成像为线状、体积较小的目标,若利用原始降采样结构对检测图像进行处理,多次降采样会明显降低检测性能,而改良后的目标检测模型有效解决了这一问题。
在一些实施例中,目标检测装置中,构建单元1020,用于对降采样结构中 至少一个降采样层的步长进行调整。
在一些实施例中,目标检测装置中,构建单元1020,用于删除降采样结构中的一个或多个降采样层。在一些实施例中,目标检测装置中,构建单元1020,用于删除1/4降采样层和1/32降采样层中的任一个。在一些实施例中,目标检测装置中,构建单元1020,用于将原本连接在被删除的降采样层之后的各网络结构的通道数减半。在一些实施例中,目标检测装置中,构建单元1020,用于以调整后的降采样结构中的指定降采样层为基础,增加检测分支。在一些实施例中,目标检测装置中,构建单元1020,用于根据增加的检测分支,设置目标检测模型中各检测分支所使用的锚框。在一些实施例中,目标检测装置中,构建单元1020,用于将第一预设数量的锚框组分配给各检测分支,第一预设数量是原始YOLO-v4主干网络所使用的锚框组数量,各检测分支被分配到的锚框组数量均不为0;或者,将锚框组的数量由第一预设数量增加至第二预设数量,将第二预设数量的锚框组平均分配给各检测分支。
在一些实施例中,目标检测装置还包括:剪枝单元,用于对目标检测模型进行剪枝处理。
能够理解,上述目标检测装置,能够实现前述实施例中提供的由目标检测服务器执行的目标检测方法的各个步骤,关于目标检测方法的相关阐释均适用于目标检测装置,此处不再赘述。
图11是本申请的一个实施例电子设备的结构示意图。请参考图11,在硬件层面,该电子设备包括处理器,可选地还包括内部总线、网络接口、存储器。其中,存储器可能包含内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少1个磁盘存储器等。当然,该电子设备还可能包括其他业务所需要的硬件。
处理器、网络接口和存储器可以通过内部总线相互连接,该内部总线可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。总线可以分为地址总线、数 据总线、控制总线等。为便于表示,图11中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。
存储器,用于存放程序。具体地,程序可以包括程序代码,程序代码包括计算机操作指令。存储器可以包括内存和非易失性存储器,并向处理器提供指令和数据。
处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成目标检测装置。图11示出的目标检测装置不构成本申请对目标检测装置数量的限制。处理器,执行存储器所存放的程序,并具体用于执行以下操作:
基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式;利用调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型;将检测图像输入到目标检测模型中,由目标检测模型提取检测图像的降采样特征图,根据降采样特征图得到目标检测结果;降采样特征图的尺寸是根据调整过的降采样结构确定的。
上述如本申请图1所示实施例揭示的目标检测装置执行的方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电 可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
该电子设备还可执行图1中目标检测装置执行的方法,并实现目标检测装置在图10所示实施例的功能,本申请实施例在此不再赘述。
本申请实施例还提出了一种计算机可读存储介质,该计算机可读存储介质存储一个或多个程序,该一个或多个程序包括指令,该指令当被包括多个应用程序的电子设备执行时,能够使该电子设备执行图1所示实施例中目标检测装置执行的方法,并具体用于执行:
基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式;利用调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型;将检测图像输入到目标检测模型中,由目标检测模型提取检测图像的降采样特征图,根据降采样特征图得到目标检测结果;降采样特征图的尺寸是根据调整过的降采样结构确定的。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的 指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。
Claims (16)
- 一种目标检测方法,其特征在于,包括:基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式;利用所述调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型;将检测图像输入到所述目标检测模型中,由所述目标检测模型提取所述检测图像的降采样特征图,根据所述降采样特征图得到目标检测结果;所述降采样特征图的尺寸是根据调整过的降采样结构确定的。
- 如权利要求1所述的方法,其特征在于,所述利用所述调整方式对YOLO-v4主干网络的降采样结构进行调整包括:对所述降采样结构中至少一个降采样层的步长进行调整。
- 如权利要求1所述的方法,其特征在于,所述利用所述调整方式对YOLO-v4主干网络的降采样结构进行调整包括:删除所述降采样结构中的一个或多个降采样层。
- 如权利要求3所述的方法,其特征在于,所述利用所述调整方式对YOLO-v4主干网络的降采样结构进行调整包括:删除1/4降采样层和1/32降采样层中的任一个。
- 如权利要求3所述的方法,其特征在于,所述利用所述调整方式对YOLO-v4主干网络的降采样结构进行调整还包括:将原本连接在被删除的降采样层之后的各网络结构的通道数减半。
- 如权利要求1所述的方法,其特征在于,所述构建基于YOLO-v4的目标检测模型包括:以调整后的降采样结构中的指定降采样层为基础,增加检测分支。
- 如权利要求6所述的方法,其特征在于,所述构建基于YOLO-v4的目标检测模型还包括:根据增加的检测分支,设置目标检测模型中各检测分支所使用的锚框。
- 如权利要求7所述的方法,其特征在于,所述根据增加的检测分支,设置目标检测模型中各检测分支所使用的锚框包括:将第一预设数量的锚框组分配给各检测分支,所述第一预设数量是原始YOLO-v4主干网络所使用的锚框组数量,各检测分支被分配到的锚框组数量均不为0;或者,将锚框组的数量由所述第一预设数量增加至第二预设数量,将第二预设数量的锚框组平均分配给各检测分支。
- 如权利要求1~8中任一项所述的方法,其特征在于,所述方法还包括:对所述目标检测模型进行剪枝处理。
- 一种目标检测装置,其特征在于,包括:调整单元,用于基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式;构建单元,用于利用所述调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型;检测单元,用于将检测图像输入到所述目标检测模型中,由所述目标检测模型提取所述检测图像的降采样特征图,根据所述降采样特征图得到目标检测结果;所述降采样特征图的尺寸是根据调整过的降采样结构确定的。
- 如权利要求10所述的装置,其中,所述构建单元,用于对降采样结构中至少一个降采样层的步长进行调整。
- 如权利要求10所述的装置,其中,所述构建单元,用于删除降采样结构中的一个或多个降采样层。
- 如权利要求12所述的装置,其中,所述构建单元,用于删除1/4降采样层和1/32降采样层中的任一个。
- 如权利要求12所述的装置,其中,所述构建单元,用于将原本连接在被删除的降采样层之后的各网络结构的 通道数减半。
- 如权利要求10所述的装置,其中,所述构建单元,用于以调整后的降采样结构中的指定降采样层为基础,增加检测分支。
- 一种电子设备,包括:处理器;以及被安排成存储计算机可执行指令的存储器,可执行指令在被执行时使处理器执行如下的目标检测方法:基于待检测目标的特征,对YOLO-v4主干网络的降采样结构设置至少一个调整方式;利用调整方式对YOLO-v4主干网络的降采样结构进行调整,构建基于YOLO-v4的目标检测模型;将检测图像输入到目标检测模型中,由目标检测模型提取检测图像的降采样特征图,根据降采样特征图得到目标检测结果;降采样特征图的尺寸是根据调整过的降采样结构确定的。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110148538.7 | 2021-02-03 | ||
CN202110148538.7A CN112949692A (zh) | 2021-02-03 | 2021-02-03 | 一种目标检测方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022166293A1 true WO2022166293A1 (zh) | 2022-08-11 |
Family
ID=76242151
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/130102 WO2022166293A1 (zh) | 2021-02-03 | 2021-11-11 | 一种目标检测方法和装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112949692A (zh) |
WO (1) | WO2022166293A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115661614A (zh) * | 2022-12-09 | 2023-01-31 | 江苏稻源科技集团有限公司 | 一种基于轻量化YOLO v1的目标检测方法 |
CN116363124A (zh) * | 2023-05-26 | 2023-06-30 | 南京杰智易科技有限公司 | 一种基于深度学习的钢材表面缺陷检测方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112949692A (zh) * | 2021-02-03 | 2021-06-11 | 歌尔股份有限公司 | 一种目标检测方法和装置 |
CN113962931B (zh) * | 2021-09-08 | 2022-06-24 | 宁波海棠信息技术有限公司 | 一种用于磁簧开关的异物缺陷检测方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170206431A1 (en) * | 2016-01-20 | 2017-07-20 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
CN110633594A (zh) * | 2018-06-21 | 2019-12-31 | 北京京东尚科信息技术有限公司 | 一种目标检测方法和装置 |
CN111462050A (zh) * | 2020-03-12 | 2020-07-28 | 上海理工大学 | 改进YOLOv3的极小遥感图像目标检测方法、装置及存储介质 |
CN111860064A (zh) * | 2019-04-30 | 2020-10-30 | 杭州海康威视数字技术股份有限公司 | 基于视频的目标检测方法、装置、设备及存储介质 |
CN112949692A (zh) * | 2021-02-03 | 2021-06-11 | 歌尔股份有限公司 | 一种目标检测方法和装置 |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3160395A4 (en) * | 2014-06-25 | 2018-08-08 | Canary Medical Inc. | Devices, systems and methods for using and monitoring heart valves |
CN107316054A (zh) * | 2017-05-26 | 2017-11-03 | 昆山遥矽微电子科技有限公司 | 基于卷积神经网络和支持向量机的非标准字符识别方法 |
CN110632608B (zh) * | 2018-06-21 | 2022-02-22 | 北京京东乾石科技有限公司 | 一种基于激光点云的目标检测方法和装置 |
CN109784386B (zh) * | 2018-12-29 | 2020-03-17 | 天津大学 | 一种用语义分割辅助物体检测的方法 |
CN110503070A (zh) * | 2019-08-29 | 2019-11-26 | 电子科技大学 | 基于航拍图像目标检测处理技术的交通自动化监测方法 |
CN111488804B (zh) * | 2020-03-19 | 2022-11-11 | 山西大学 | 基于深度学习的劳保用品佩戴情况检测和身份识别的方法 |
CN111553406B (zh) * | 2020-04-24 | 2023-04-28 | 上海锘科智能科技有限公司 | 基于改进yolo-v3的目标检测系统、方法及终端 |
CN111738987A (zh) * | 2020-06-01 | 2020-10-02 | 湖南品信生物工程有限公司 | 一种多任务宫颈癌细胞自动识别方法及装置 |
CN111899227A (zh) * | 2020-07-06 | 2020-11-06 | 北京交通大学 | 基于无人机作业的铁路扣件缺陷自动采集辨识方法 |
CN111967480A (zh) * | 2020-09-07 | 2020-11-20 | 上海海事大学 | 基于权重共享的多尺度自注意力目标检测方法 |
CN112232371B (zh) * | 2020-09-17 | 2022-06-10 | 福州大学 | 一种基于YOLOv3与文本识别的美式车牌识别方法 |
CN112200773A (zh) * | 2020-09-17 | 2021-01-08 | 苏州慧维智能医疗科技有限公司 | 一种基于空洞卷积的编码器和解码器的大肠息肉检测方法 |
-
2021
- 2021-02-03 CN CN202110148538.7A patent/CN112949692A/zh active Pending
- 2021-11-11 WO PCT/CN2021/130102 patent/WO2022166293A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170206431A1 (en) * | 2016-01-20 | 2017-07-20 | Microsoft Technology Licensing, Llc | Object detection and classification in images |
CN110633594A (zh) * | 2018-06-21 | 2019-12-31 | 北京京东尚科信息技术有限公司 | 一种目标检测方法和装置 |
CN111860064A (zh) * | 2019-04-30 | 2020-10-30 | 杭州海康威视数字技术股份有限公司 | 基于视频的目标检测方法、装置、设备及存储介质 |
CN111462050A (zh) * | 2020-03-12 | 2020-07-28 | 上海理工大学 | 改进YOLOv3的极小遥感图像目标检测方法、装置及存储介质 |
CN112949692A (zh) * | 2021-02-03 | 2021-06-11 | 歌尔股份有限公司 | 一种目标检测方法和装置 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115661614A (zh) * | 2022-12-09 | 2023-01-31 | 江苏稻源科技集团有限公司 | 一种基于轻量化YOLO v1的目标检测方法 |
CN115661614B (zh) * | 2022-12-09 | 2024-05-24 | 江苏稻源科技集团有限公司 | 一种基于轻量化YOLO v1的目标检测方法 |
CN116363124A (zh) * | 2023-05-26 | 2023-06-30 | 南京杰智易科技有限公司 | 一种基于深度学习的钢材表面缺陷检测方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112949692A (zh) | 2021-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022166293A1 (zh) | 一种目标检测方法和装置 | |
WO2022121531A1 (zh) | 一种产品缺陷检测方法和装置 | |
WO2022166294A1 (zh) | 一种目标检测方法和装置 | |
CN109840883B (zh) | 一种训练物体识别神经网络的方法、装置及计算设备 | |
CN109816659B (zh) | 图像分割方法、装置及系统 | |
WO2022116720A1 (zh) | 目标检测方法、装置和电子设备 | |
WO2023116632A1 (zh) | 基于时空记忆信息的视频实例分割方法和分割装置 | |
WO2021143207A1 (zh) | 图像处理方法、装置、计算处理设备及介质 | |
WO2020134703A1 (zh) | 一种基于神经网络系统的图像处理方法及神经网络系统 | |
US20210295607A1 (en) | Data reading/writing method and system in 3d image processing, storage medium and terminal | |
CN110599453A (zh) | 一种基于图像融合的面板缺陷检测方法、装置及设备终端 | |
WO2020010982A1 (zh) | 一种风险防控方法、系统及终端设备 | |
CN115187820A (zh) | 轻量化的目标检测方法、装置、设备、存储介质 | |
WO2020253117A1 (zh) | 一种数据处理方法及装置 | |
WO2019201029A1 (zh) | 备选框更新方法及装置 | |
CN115631112B (zh) | 一种基于深度学习的建筑轮廓矫正方法及装置 | |
CN113674203A (zh) | 缺陷检测模型训练方法、装置和缺陷检测方法、装置 | |
WO2021168745A1 (zh) | 一种磁共振成像模型的训练方法及装置 | |
CN117173070A (zh) | 一种基于fpga的图像处理融合方法及系统 | |
CN116311462A (zh) | 一种结合上下文信息和vgg19的人脸图像修复识别方法 | |
WO2022121521A1 (zh) | 一种音频信号时序对齐方法和装置 | |
US11664818B2 (en) | Neural network processor for compressing featuremap data and computing system including the same | |
CN113763412A (zh) | 图像处理方法、装置及电子设备、计算机可读存储介质 | |
CN110796115B (zh) | 图像检测方法、装置、电子设备及可读存储介质 | |
CN108280135A (zh) | 实现数据结构可视化的方法、装置和电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21924289 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21924289 Country of ref document: EP Kind code of ref document: A1 |