WO2022166294A1 - 一种目标检测方法和装置 - Google Patents

一种目标检测方法和装置 Download PDF

Info

Publication number
WO2022166294A1
WO2022166294A1 PCT/CN2021/130259 CN2021130259W WO2022166294A1 WO 2022166294 A1 WO2022166294 A1 WO 2022166294A1 CN 2021130259 W CN2021130259 W CN 2021130259W WO 2022166294 A1 WO2022166294 A1 WO 2022166294A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolutional layer
pruning
target detection
pruning condition
satisfies
Prior art date
Application number
PCT/CN2021/130259
Other languages
English (en)
French (fr)
Inventor
张一凡
刘杰
Original Assignee
歌尔股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 歌尔股份有限公司 filed Critical 歌尔股份有限公司
Publication of WO2022166294A1 publication Critical patent/WO2022166294A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to the technical field of computer vision, and in particular, to a target detection method and device.
  • YOLO (English full name You Only Look Once) is a typical single-stage target detection technology, that is, the information such as the position and category of the target is directly returned according to the original image, and it has been developed to the fourth version, namely YOLO-v4.
  • Embodiments of the present application provide a target detection method and apparatus, so as to reduce the amount of calculation in the target detection process.
  • an embodiment of the present application provides a target detection method, including: constructing a basic model based on YOLO-v4, and the backbone network of the basic model includes a plurality of residual blocks; setting a first pruning condition related to output information; The convolutional layer that satisfies the first pruning condition in each residual block is used as the first convolutional layer, and the convolutional layer that does not satisfy the first pruning condition in each residual block is used as the second convolutional layer; The convolutional layer is pruned, and each second convolutional layer is not pruned to obtain the target detection model; the detection image is input into the target detection model to obtain the target detection result.
  • an embodiment of the present application further provides a target detection device, including: a building unit for building a basic model based on YOLO-v4, the backbone network of the basic model includes a plurality of residual blocks; a pruning unit for setting The first pruning condition related to the output information; the convolution layer that satisfies the first pruning condition in each residual block is used as the first convolution layer, and the convolution layer that does not meet the first pruning condition in each residual block is used as the first convolution layer. layer as the second convolutional layer; pruning each first convolutional layer and not pruning each second convolutional layer to obtain a target detection model; detection unit, used to input the detection image into the target detection model , get the target detection result.
  • a target detection device including: a building unit for building a basic model based on YOLO-v4, the backbone network of the basic model includes a plurality of residual blocks; a pruning unit for setting The first pruning condition related to the output information; the convolution layer that satisfies the first pruning condition in
  • embodiments of the present application further provide an electronic device, including: a processor; and a memory arranged to store computer-executable instructions, the executable instructions, when executed, cause the processor to execute the above target detection method.
  • embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and when the one or more programs are executed by an electronic device including multiple application programs, The device performs the object detection method as above.
  • the above-mentioned at least one technical solution adopted in the embodiment of the present application can achieve the following beneficial effects: by setting a pruning method related to the output information, the convolutional layer of the residual block in the basic model constructed based on YOLO-v4 is selectively performed. Pruning is used to obtain a target detection model. Compared with the model built using the original YOLO-v4, the target detection model is smaller in size, and can still maintain a high target detection accuracy, effectively reducing the amount of calculation in the target detection process.
  • FIG. 1 is a schematic flowchart of a target detection method according to an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a residual block according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a target detection apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
  • Pruning technology has been widely used in the field of neural networks. When there are many parameters of the neural network in the target detection model, but some of the parameters do not contribute much to the final output result and appear redundant, the pruning technique can be used, that is, these redundant parameters are pruned.
  • pruning can reduce the volume of the target detection model, if it is pruned at will, it will reduce the accuracy of target detection. Therefore, how to reasonably prune the target detection model needs to be considered.
  • the technical idea of this application is to build a basic model based on YOLO-v4, select the residual block of its backbone network as the pruning object, and selectively prune the convolutional layer of the residual block, thereby reducing the size of the target detection model. At the same time, it can maintain a high target detection accuracy.
  • FIG. 1 is a schematic flowchart of a target detection method according to an embodiment of the present application. As shown in FIG. 1 , the method includes:
  • step S110 a basic model is constructed based on YOLO-v4, and the backbone network of the basic model includes a plurality of residual blocks.
  • the network structure of the basic model can be adjusted other than pruning as needed, such as adding several detection branches, adjusting the downsampling structure of the backbone network to reduce the number of downsampling, etc. This application does not limit this.
  • the backbone network may include multiple residual blocks as shown in FIG. 2 .
  • the residual block shown in Figure 2 contains 7 convolutional layers (conv), namely convolutional layer 210, convolutional layer 220, convolutional layer 230, convolutional layer 240, convolutional layer 250, convolutional layer 260 and Convolutional layer 270.
  • conv convolutional layers
  • an XOR operation needs to be performed in combination with the output results of the corresponding convolutional layers and the concatenation operation (concat).
  • 3*3 represents the size of the convolution kernel used by the convolutional layer 210
  • 128 in parentheses represents the number of channels of the convolutional layer 210 . The rest of the convolutional layers will not be described again.
  • Step S120 set the first pruning condition related to the output information; take the convolutional layer that satisfies the first pruning condition in each residual block as the first convolutional layer, and set each residual block that does not meet the first pruning condition as the first convolutional layer
  • the conditional convolutional layer acts as the second convolutional layer.
  • Step S130 pruning each of the first convolutional layers, and not performing pruning on each of the second convolutional layers, to obtain a target detection model.
  • the pruning here may include pruning of input channels and/or pruning of output channels.
  • step S140 the detected image is input into the target detection model to obtain the target detection result.
  • the detected target may be defects of vehicles, pedestrians, or industrial products, and so on.
  • the method shown in Figure 1 selectively prunes the convolutional layer of the residual block in the basic model based on YOLO-v4 by setting the pruning method related to the output information to obtain the target detection model, Compared with the model built using the original YOLO-v4, the target detection model is smaller in size and still maintains a high target detection accuracy, effectively reducing the amount of computation in the target detection process.
  • using a convolutional layer that does not meet the first pruning condition in each residual block as the second convolutional layer includes: if the output information of one convolutional layer is the residual in the residual block If the input information of the poor structure is not satisfied, the convolutional layer does not satisfy the first pruning condition.
  • the residual block has a CSP (Cross Stage Partial, cross-stage local) structure. If the second convolutional layer (whose output information is the residual in the residual block) The input information of the structure) is pruned, which may cause the number of channels output to the residual structure to change, affecting the use of the residual structure.
  • CSP Cross Stage Partial, cross-stage local
  • the output information of the second convolutional layer may also undergo batch normalization before being used as the input information of the residual structure in the residual block, but it will no longer be convolved by other convolutional layers.
  • the convolutional layer 240 and the convolutional layer 250 and the XOR calculation below constitute a residual structure. Since the output information of the convolutional layer 230 shown by the dashed box is the input information of the residual structure, the convolutional layer 230 is the only second convolutional layer that cannot be pruned in the residual block.
  • pruning each first convolutional layer includes: setting a second pruning condition related to the position of the convolutional layer; judging whether each first convolutional layer satisfies the second pruning condition condition; if a first convolutional layer satisfies the second pruning condition, the output channel of the first convolutional layer is pruned.
  • the residual block has a CSP structure
  • the second pruning condition is set according to the position of the convolutional layer, so that the residual structure can be used reasonably.
  • determining whether each first convolutional layer satisfies the second pruning condition includes: if a first convolutional layer is the last convolutional layer of the residual structure in the residual block, then The first convolutional layer does not satisfy the second pruning condition, otherwise the first convolutional layer satisfies the second pruning condition.
  • the convolutional layer 250 is the last convolutional layer of the residual structure in the residual block.
  • the output channel of the convolutional layer 250 cannot be pruned, except for the convolutional layer.
  • the output channels of layer 250 and other convolutional layers other than convolutional layer 230 can be pruned.
  • pruning the output channel of the first convolutional layer includes: according to the ⁇ parameter of the BN layer connected after the first convolutional layer, pruning the first convolutional layer The output channel of the convolutional layer is pruned by network slimming.
  • the basic model needs to use the BN (Batch Normalization, batch normalization) layer, and the basic model is first sparsely trained to make the ⁇ parameters of each BN layer sparse, so as to meet the use conditions of network slimming pruning.
  • the specific network slimming pruning operation can be implemented with reference to the prior art, which will not be repeated here.
  • pruning each first convolutional layer includes: setting a third pruning condition related to the input information; judging whether each first convolutional layer satisfies the third pruning condition; If a first convolutional layer satisfies the third pruning condition, the input channel of the first convolutional layer is pruned.
  • the input information here corresponds to the previous output information.
  • the difference is that the output information is obtained by the convolutional layer itself, while the input information is received by the convolutional layer, which may be the output of other network structures or the original image. And so on, therefore, it is necessary to prune the input channel reasonably according to the situation.
  • determining whether each first convolutional layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is the output information of the second convolutional layer, then the The first convolutional layer does not satisfy the third pruning condition.
  • the input channel of the convolutional layer that receives the output information of the second convolutional layer cannot be pruned to maintain consistency.
  • determining whether each first convolutional layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is a detection image, the first convolutional layer does not The third pruning condition is satisfied.
  • the detection image here may have undergone some preprocessing, that is, it also includes the case where the input information is the tensor representation of the detection image. In these cases, since the input information is fixed, the input channel cannot be pruned accordingly.
  • determining whether each first convolutional layer satisfies the third pruning condition includes: if the input information received by one first convolutional layer is the output information of the second convolutional layer and another If the output information of the convolutional layer is XORed, the first convolutional layer does not satisfy the third pruning condition.
  • the number of channels of the result of the XOR calculation between the output information of the second convolutional layer and the output information of another convolutional layer is also a fixed value. Therefore, it is used as the first When the input information of a convolutional layer is used, the input channel of the first convolutional layer cannot be pruned either.
  • Convolutional layer 210 If the input information is a detection image, the input channel cannot be pruned; if the input information is not a detection image, the input channel can be pruned according to the input information. In either case, the number of output channels of the convolutional layer 210 can be pruned according to the ⁇ parameter of the subsequent BN layer.
  • Convolutional layer 220 the number of its input channels can be pruned according to the output information of the convolutional layer 210, and the number of its output channels can be pruned according to the ⁇ parameter of the subsequent BN layer.
  • Convolutional layer 230 Neither the number of input channels nor the number of output channels can be pruned.
  • Convolutional layer 240 The number of input channels cannot be pruned, and the number of output channels can be pruned according to the ⁇ parameter of the subsequent BN layer.
  • Convolutional layer 250 The number of input channels can be pruned according to the output information of the convolutional layer 240, and the number of output channels cannot be pruned.
  • Convolutional layer 260 The number of input channels cannot be pruned, and the number of output channels can be pruned according to the ⁇ parameter of the subsequent BN layer.
  • Convolutional layer 270 The number of input channels is pruned according to the cascaded outputs of the convolutional layer 220 and the convolutional layer 260, and the number of output channels can be pruned according to the ⁇ parameter of the BN layer that follows.
  • the target detection model 1 is obtained, and the original YOLO-v4 is used to construct the target detection model 2, and the same sample set is used for training and test set testing.
  • the target detection model 1 is relative to the target
  • the detection model 2 not only has a large reduction in size, but also has a small improvement in the two indicators of average detection accuracy (mAP) and detection accuracy (precision), indicating that pruning not only did not reduce the performance of the model, but even improved the performance. , with a non-obvious effect.
  • an embodiment of the present application further provides a target detection apparatus, which is used to implement the target detection method of any of the above embodiments.
  • FIG. 3 shows a schematic structural diagram of a target detection apparatus according to an embodiment of the present application.
  • the target detection apparatus 300 includes:
  • the construction unit 310 is configured to construct a basic model based on YOLO-v4, and the backbone network of the basic model includes a plurality of residual blocks.
  • the network structure of the basic model can be adjusted other than pruning as needed, such as adding several detection branches, adjusting the downsampling structure of the backbone network to reduce the number of downsampling, etc. This application does not limit this.
  • the backbone network may include multiple residual blocks as shown in FIG. 2 .
  • the residual block shown in Figure 2 contains 7 convolutional layers (conv), namely convolutional layer 210, convolutional layer 220, convolutional layer 230, convolutional layer 240, convolutional layer 250, convolutional layer 260 and Convolutional layer 270.
  • conv convolutional layers
  • 3*3 represents the size of the convolution kernel used by the convolutional layer 210
  • 128 in parentheses represents the number of channels of the convolutional layer 210 . The rest of the convolutional layers will not be described again.
  • the pruning unit 320 is used to set the first pruning condition related to the output information; the convolutional layer that satisfies the first pruning condition in each residual block is used as the first convolutional layer, and the convolutional layer that does not satisfy the first pruning condition in each residual block is used as the first convolutional layer.
  • the convolutional layer of the first pruning condition is used as the second convolutional layer; each first convolutional layer is pruned, and each second convolutional layer is not pruned to obtain a target detection model.
  • the pruning here may include pruning of input channels and/or pruning of output channels.
  • the detection unit 330 is configured to input the detection image into the target detection model to obtain the target detection result.
  • the device shown in Figure 3 selectively prunes the convolutional layer of the residual block in the basic model based on YOLO-v4 by setting the pruning method related to the output information to obtain the target detection model, Compared with the model built using the original YOLO-v4, the target detection model is smaller in size and still maintains a high target detection accuracy, effectively reducing the amount of computation in the target detection process.
  • the pruning unit 320 is configured to, if the output information of a convolutional layer is the input information of the residual structure in the residual block, the convolutional layer does not satisfy the first pruning Branch conditions.
  • the pruning unit 320 is configured to set a second pruning condition related to the position of the convolutional layer; determine whether each first convolutional layer satisfies the second pruning condition; When a convolutional layer satisfies the second pruning condition, the output channel of the first convolutional layer is pruned.
  • the pruning unit 320 is configured to, if a first convolutional layer is the last convolutional layer of the residual structure in the residual block, the first convolutional layer does not satisfy the first convolutional layer. Second pruning condition, otherwise the first convolutional layer satisfies the second pruning condition.
  • the pruning unit 320 is configured to perform network slimming pruning on the output channel of the first convolutional layer according to the ⁇ parameter of the BN layer connected after the first convolutional layer .
  • the pruning unit 320 is configured to set a third pruning condition related to the input information; determine whether each first convolutional layer satisfies the third pruning condition; If the convolutional layer satisfies the third pruning condition, the input channel of the first convolutional layer is pruned.
  • the pruning unit 320 is configured to, if the input information received by a first convolutional layer is the output information of the second convolutional layer, the first convolutional layer does not satisfy the third convolutional layer. Pruning conditions.
  • the pruning unit 320 is configured to, if the input information received by a first convolutional layer is a detection image, the first convolutional layer does not satisfy the third pruning condition.
  • the pruning unit 320 is configured to perform XOR with the output information of the second convolutional layer if the input information received by one first convolutional layer is the output information of the second convolutional layer and the output information of another convolutional layer As a result of the calculation, the first convolutional layer does not meet the third pruning condition.
  • target detection apparatus can implement each step of the target detection method provided in the foregoing embodiments, and relevant explanations about the target detection method are applicable to the target detection apparatus, and are not repeated here.
  • FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device includes a processor, optionally an internal bus, a network interface, and a memory.
  • the memory may include memory, such as high-speed random-access memory (Random-Access Memory, RAM), or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random-Access Memory
  • non-volatile memory such as at least one disk memory.
  • the electronic equipment may also include hardware required for other services.
  • the processor, network interface and memory can be connected to each other through an internal bus, which can be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Component Interconnect) bus. Industry Standard Architecture, extended industry standard structure) bus, etc.
  • the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one bidirectional arrow is used in FIG. 4, but it does not mean that there is only one bus or one type of bus.
  • the program may include program code, and the program code includes computer operation instructions.
  • the memory may include memory and non-volatile memory and provide instructions and data to the processor.
  • the processor reads the corresponding computer program from the non-volatile memory into the memory and runs it, forming a target detection device on a logical level.
  • the object detection devices shown in FIG. 4 do not constitute a limitation of the present application in number.
  • the processor executes the program stored in the memory, and is specifically used to perform the following operations:
  • the basic model is constructed based on YOLO-v4.
  • the backbone network of the basic model includes multiple residual blocks; the first pruning condition related to the output information is set; the convolutional layer that satisfies the first pruning condition in each residual block is used as the first pruning condition.
  • a convolutional layer, the convolutional layer in each residual block that does not meet the first pruning condition is used as the second convolutional layer; the first convolutional layer is pruned, and the second convolutional layer is not pruned branch to obtain the target detection model; input the detection image into the target detection model to obtain the target detection result.
  • the above-mentioned method performed by the target detection apparatus disclosed in the embodiment shown in FIG. 1 of the present application may be applied to a processor, or implemented by a processor.
  • a processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the above-mentioned method can be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the electronic device can also perform the method performed by the target detection apparatus in FIG. 1 , and implement the functions of the target detection apparatus in the embodiment shown in FIG. 3 , and details are not described herein again in this embodiment of the present application.
  • An embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs include instructions, and the instructions are executed by an electronic device including multiple application programs.
  • the electronic device can be made to execute the method executed by the target detection apparatus in the embodiment shown in FIG. 1 , and is specifically used to execute:
  • the basic model is constructed based on YOLO-v4.
  • the backbone network of the basic model includes multiple residual blocks; the first pruning condition related to the output information is set; the convolutional layer that satisfies the first pruning condition in each residual block is used as the first pruning condition.
  • a convolutional layer, the convolutional layer in each residual block that does not meet the first pruning condition is used as the second convolutional layer; the first convolutional layer is pruned, and the second convolutional layer is not pruned branch to obtain the target detection model; input the detection image into the target detection model to obtain the target detection result.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include forms of non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种目标检测方法和装置。该方法包括:基于YOLO-v4构建基础模型,基础模型的主干网络包括多个残差块;设置与输出信息相关的第一剪枝条件;将各残差块中满足第一剪枝条件的卷积层作为第一卷积层,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层;对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型;将检测图像输入到目标检测模型中,得到目标检测结果。该技术方案能够保持较高的目标检测精度,有效减少了目标检测过程中的计算量。

Description

一种目标检测方法和装置 技术领域
本申请涉及计算机视觉技术领域,尤其涉及一种目标检测方法和装置。
发明背景
YOLO(英文全称为You Only Look Once)是典型的单阶段目标检测技术,即直接根据原始图像回归出目标的位置和类别等信息,目前已经发展到第四个版本,即YOLO-v4。
在实际应用中,用户往往会先基于YOLO-v4构建目标检测模型,再根据实际需求对目标检测模型的网络结构进行调整,这些调整可能会带来更多的计算量,因此如何减少目标检测过程中的计算量是需要解决的问题。
需要说明的是,这里的陈述仅提供与本申请有关的背景信息,而不必然地构成现有技术。
发明内容
本申请实施例提供了一种目标检测方法和装置,以减少目标检测过程中的计算量。
本申请实施例采用下述技术方案:
第一方面,本申请实施例提供一种目标检测方法,包括:基于YOLO-v4构建基础模型,基础模型的主干网络包括多个残差块;设置与输出信息相关的第一剪枝条件;将各残差块中满足第一剪枝条件的卷积层作为第一卷积层,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层;对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型;将检测图像输入到目标检测模型中,得到目标检测结果。
第二方面,本申请实施例还提供一种目标检测装置,包括:构建单元,用于基于YOLO-v4构建基础模型,基础模型的主干网络包括多个残差块;剪枝单元,用于设置与输出信息相关的第一剪枝条件;将各残差块中满足第一剪枝条 件的卷积层作为第一卷积层,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层;对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型;检测单元,用于将检测图像输入到目标检测模型中,得到目标检测结果。
第三方面,本申请实施例还提供一种电子设备,包括:处理器;以及被安排成存储计算机可执行指令的存储器,可执行指令在被执行时使处理器执行如上的目标检测方法。
第四方面,本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质存储一个或多个程序,一个或多个程序当被包括多个应用程序的电子设备执行时,使得电子设备执行如上的目标检测方法。
本申请实施例采用的上述至少一个技术方案能够达到以下有益效果:通过设置与输出信息相关的剪枝方式,对基于YOLO-v4构建的基础模型中残差块的卷积层进行有选择性的剪枝,得到目标检测模型,相对于使用原始YOLO-v4构建的模型,目标检测模型的体积更小,并且仍然能够保持较高的目标检测精度,有效减少了目标检测过程中的计算量。
附图简要说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为根据本申请一个实施例的一种目标检测方法的流程示意图;
图2为根据本申请一个实施例的残差块结构示意图;
图3为根据本申请一个实施例的一种目标检测装置的结构示意图;
图4为本申请实施例中一种电子设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的 实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
剪枝技术在神经网络领域得到了广泛应用。当目标检测模型中神经网络的参数众多,但其中有些参数对最终的输出结果贡献不大而显得冗余时,就可以使用剪枝技术,也就是将这些冗余的参数剪掉。
剪枝虽然能够减少目标检测模型的体积,但是如果随意剪枝,会降低目标检测的精度,因此如何对目标检测模型进行合理剪枝是需要考虑的。
本申请的技术构思在于基于YOLO-v4构建基础模型,选取其主干网络的残差块作为剪枝对象,对残差块的卷积层进行有选择的剪枝,从而在减少目标检测模型体积的同时还能够保持较高的目标检测精度。
以下结合附图,详细说明本申请各实施例提供的技术方案。
图1为根据本申请一个实施例的一种目标检测方法的流程示意图,如图1所示,该方法包括:
步骤S110,基于YOLO-v4构建基础模型,基础模型的主干网络包括多个残差块。
这里,基础模型的网络结构可以按需进行除剪枝外的其他调整,例如增加若干个检测分支,调整主干网络的降采样结构以减少降采样次数,等等,本申请对此不做限制。
在一些实施例中,主干网络可以包括多个如图2所示的残差块。图2所示的残差块包含7个卷积层(conv),分别为卷积层210、卷积层220、卷积层230、卷积层240、卷积层250、卷积层260和卷积层270。另外,还需要结合相应卷积层的输出结果执行一次异或操作
Figure PCTCN2021130259-appb-000001
和拼接操作(concat)。
以最上方的卷积层210为例,3*3代表卷积层210使用的卷积核大小,括号中的128代表卷积层210的通道数量。其余卷积层不再赘述。
步骤S120,设置与输出信息相关的第一剪枝条件;将各残差块中满足第一 剪枝条件的卷积层作为第一卷积层,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层。
步骤S130,对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型。
这里的剪枝可以包括输入通道的剪枝和/或输出通道的剪枝。
步骤S140,将检测图像输入到目标检测模型中,得到目标检测结果。具体地,检测的目标可以是车辆、行人或者工业产品的缺陷等等。
可见,图1所示的方法,通过设置与输出信息相关的剪枝方式,对基于YOLO-v4构建的基础模型中残差块的卷积层进行有选择性的剪枝,得到目标检测模型,相对于使用原始YOLO-v4构建的模型,目标检测模型的体积更小,并且仍然能够保持较高的目标检测精度,有效减少了目标检测过程中的计算量。
在一些实施例中,目标检测方法中,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层包括:若一个卷积层的输出信息为残差块中残差结构的输入信息,则该卷积层不满足所述第一剪枝条件。
这样设置的原因在于,基于YOLO-v4构建的基础模型中,残差块具有CSP(Cross Stage Partial,跨阶段局部)结构,如果对第二卷积层(其输出信息为残差块中残差结构的输入信息)进行剪枝,则可能导致输出到残差结构的通道数发生改变,影响残差结构的使用。
这里需要说明的是,第二卷积层的输出信息在作为残差块中残差结构的输入信息前还可能经过批标准化处理,但不再经过其他卷积层进行卷积。
以图2为例,卷积层240和卷积层250及下面的异或计算构成残差结构。由于以虚线框示出的卷积层230的输出信息就是残差结构的输入信息,因此卷积层230是该残差块中唯一一个不可剪枝的第二卷积层。
在一些实施例中,目标检测方法中,对各第一卷积层进行剪枝包括:设置与卷积层位置相关的第二剪枝条件;判断各第一卷积层是否满足第二剪枝条件;若一个第一卷积层满足第二剪枝条件,则对该第一卷积层的输出通道进行剪枝。
如前所述,残差块具有CSP结构,按照卷积层位置设置第二剪枝条件,可 以使得残差结构得到合理的使用。
在一些实施例中,目标检测方法中,判断各第一卷积层是否满足第二剪枝条件包括:若一个第一卷积层为残差块中残差结构的最后一个卷积层,则该第一卷积层不满足第二剪枝条件,否则该第一卷积层满足第二剪枝条件。
以图2为例,卷积层250为残差块中残差结构的最后一个卷积层,为保证后续异或计算的准确性,该卷积层250的输出通道不可剪枝,除卷积层250和卷积层230外的其他卷积层的输出通道可以剪枝。
剪枝具有多种方式,可以选择任一种现有技术来实现。优选地,在一些实施例中,目标检测方法中,对该第一卷积层的输出通道进行剪枝包括:根据连接在该第一卷积层后的BN层的γ参数,对该第一卷积层的输出通道进行network slimming剪枝。
这里,需要基础模型使用BN(Batch Normalization,批标准化)层,并且对基础模型先进行稀疏化训练,使得各BN层的γ参数稀疏化,从而满足network slimming剪枝的使用条件。具体的network slimming剪枝操作可以参照现有技术实现,在此不再赘述。
在一些实施例中,目标检测方法中,对各第一卷积层进行剪枝包括:设置与输入信息相关的第三剪枝条件;判断各第一卷积层是否满足第三剪枝条件;若一个第一卷积层满足第三剪枝条件,则对该第一卷积层的输入通道进行剪枝。
这里的输入信息与前面的输出信息相对应,区别在于输出信息是卷积层自身得到的,而输入信息是卷积层接收到的,既可能是其他网络结构的输出,也可能是原始图片,等等,因此,要视情况对输入通道进行合理的剪枝。
在一些实施例中,目标检测方法中,判断各第一卷积层是否满足第三剪枝条件包括:若一个第一卷积层接收的输入信息是第二卷积层的输出信息,则该第一卷积层不满足第三剪枝条件。
由于第二卷积层不可剪枝,因此相应地,接收到第二卷积层输出信息的卷积层,其输入通道也不可剪枝,以保持一致。
在一些实施例中,目标检测方法中,判断各第一卷积层是否满足第三剪枝 条件包括:若一个第一卷积层接收的输入信息是检测图像,则该第一卷积层不满足第三剪枝条件。
当然,此处的检测图像可能经过了一些前处理,即也包含输入信息是检测图像的张量表示的情况。在这些情况下,由于输入信息是固定的,因此相应地,输入通道也不能够进行剪枝。
在一些实施例中,目标检测方法中,判断各第一卷积层是否满足第三剪枝条件包括:若一个第一卷积层接收的输入信息是第二卷积层的输出信息与另一卷积层的输出信息进行异或计算的结果,则该第一卷积层不满足第三剪枝条件。
为了保证残差结构的正常使用,第二卷积层的输出信息与另一卷积层的输出信息进行异或计算的结果的通道数量也是一个固定的值,因此,将其作为第一个第一卷积层的输入信息时,该第一卷积层的输入通道也不能够进行剪枝。
总结地说,以图2为例:
卷积层210:若输入信息是检测图像,则输入通道不可剪枝;若输入信息不是检测图像,则输入通道可以根据输入信息进行剪枝。无论是哪种情况,卷积层210的输出通道的数量都可以根据接在其后的BN层的γ参数进行剪枝。
卷积层220:其输入通道的数量可以根据卷积层210的输出信息进行剪枝,其输出通道的数量可以根据接在其后的BN层的γ参数进行剪枝。
卷积层230:输入通道的数量和输出通道的数量均不可剪枝。
卷积层240:输入通道的数量不可剪枝,输出通道的数量可以根据接在其后的BN层的γ参数进行剪枝。
卷积层250:输入通道的数量可以根据卷积层240的输出信息进行剪枝,输出通道的数量不可剪枝。
卷积层260:输入通道的数量不可剪枝,输出通道的数量可以根据接在其后的BN层的γ参数进行剪枝。
卷积层270:输入通道的数量根据卷积层220和卷积层260的级联输出进行剪枝,输出通道的数量可以根据接在其后的BN层的γ参数进行剪枝。
以上面卷积层210~卷积层270的剪枝为例得到目标检测模型1,以原始 YOLO-v4构建目标检测模型2,利用相同的样本集训练和测试集测试,目标检测模型1相对目标检测模型2不仅体积有了较大的缩减,而且在平均检测精度(mAP)和检测精度(precision)两个指标上都有小幅提升,说明剪枝不仅没有降低模型性能,甚至还做到了性能提升,具有非显而易见的效果。
另外,本申请实施例还提供一种目标检测装置,用于实现如上任一实施例的目标检测方法。
图3示出了根据本申请一个实施例的目标检测装置的结构示意图。如图3所示,目标检测装置300包括:
构建单元310,用于基于YOLO-v4构建基础模型,基础模型的主干网络包括多个残差块。
这里,基础模型的网络结构可以按需进行除剪枝外的其他调整,例如增加若干个检测分支,调整主干网络的降采样结构以减少降采样次数,等等,本申请对此不做限制。
在一些实施例中,主干网络可以包括多个如图2所示的残差块。图2所示的残差块包含7个卷积层(conv),分别为卷积层210、卷积层220、卷积层230、卷积层240、卷积层250、卷积层260和卷积层270。以最上方的卷积层210为例,3*3代表卷积层210使用的卷积核大小,括号中的128代表卷积层210的通道数量。其余卷积层不再赘述。
剪枝单元320,用于设置与输出信息相关的第一剪枝条件;将各残差块中满足第一剪枝条件的卷积层作为第一卷积层,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层;对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型。
这里的剪枝可以包括输入通道的剪枝和/或输出通道的剪枝。
检测单元330,用于将检测图像输入到目标检测模型中,得到目标检测结果。
可见,图3所示的装置,通过设置与输出信息相关的剪枝方式,对基于YOLO-v4构建的基础模型中残差块的卷积层进行有选择性的剪枝,得到目标检测模型,相对于使用原始YOLO-v4构建的模型,目标检测模型的体积更小,并 且仍然能够保持较高的目标检测精度,有效减少了目标检测过程中的计算量。
在一些实施例中,目标检测装置中,剪枝单元320,用于若一个卷积层的输出信息为残差块中残差结构的输入信息,则该卷积层不满足所述第一剪枝条件。
在一些实施例中,目标检测装置中,剪枝单元320,用于设置与卷积层位置相关的第二剪枝条件;判断各第一卷积层是否满足第二剪枝条件;若一个第一卷积层满足第二剪枝条件,则对该第一卷积层的输出通道进行剪枝。
在一些实施例中,目标检测装置中,剪枝单元320,用于若一个第一卷积层为残差块中残差结构的最后一个卷积层,则该第一卷积层不满足第二剪枝条件,否则该第一卷积层满足第二剪枝条件。
在一些实施例中,目标检测装置中,剪枝单元320,用于根据连接在该第一卷积层后的BN层的γ参数,对该第一卷积层的输出通道进行network slimming剪枝。
在一些实施例中,目标检测装置中,剪枝单元320,用于设置与输入信息相关的第三剪枝条件;判断各第一卷积层是否满足第三剪枝条件;若一个第一卷积层满足第三剪枝条件,则对该第一卷积层的输入通道进行剪枝。
在一些实施例中,目标检测装置中,剪枝单元320,用于若一个第一卷积层接收的输入信息是第二卷积层的输出信息,则该第一卷积层不满足第三剪枝条件。
在一些实施例中,目标检测装置中,剪枝单元320,用于若一个第一卷积层接收的输入信息是检测图像,则该第一卷积层不满足第三剪枝条件。
在一些实施例中,目标检测装置中,剪枝单元320,用于若一个第一卷积层接收的输入信息是第二卷积层的输出信息与另一卷积层的输出信息进行异或计算的结果,则该第一卷积层不满足第三剪枝条件。
能够理解,上述目标检测装置,能够实现前述实施例中提供的目标检测方法的各个步骤,关于目标检测方法的相关阐释均适用于目标检测装置,此处不再赘述。
图4是本申请的一个实施例电子设备的结构示意图。请参考图4,在硬件层 面,该电子设备包括处理器,可选地还包括内部总线、网络接口、存储器。其中,存储器可能包含内存,例如高速随机存取存储器(Random-Access Memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少1个磁盘存储器等。当然,该电子设备还可能包括其他业务所需要的硬件。
处理器、网络接口和存储器可以通过内部总线相互连接,该内部总线可以是ISA(Industry Standard Architecture,工业标准体系结构)总线、PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图4中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。
存储器,用于存放程序。具体地,程序可以包括程序代码,程序代码包括计算机操作指令。存储器可以包括内存和非易失性存储器,并向处理器提供指令和数据。
处理器从非易失性存储器中读取对应的计算机程序到内存中然后运行,在逻辑层面上形成目标检测装置。图4中示出的目标检测装置并不在数量上构成本申请的限制。处理器,执行存储器所存放的程序,并具体用于执行以下操作:
基于YOLO-v4构建基础模型,基础模型的主干网络包括多个残差块;设置与输出信息相关的第一剪枝条件;将各残差块中满足第一剪枝条件的卷积层作为第一卷积层,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层;对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型;将检测图像输入到目标检测模型中,得到目标检测结果。
上述如本申请图1所示实施例揭示的目标检测装置执行的方法可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processor,DSP)、专用集成电路 (Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
该电子设备还可执行图1中目标检测装置执行的方法,并实现目标检测装置在图3所示实施例的功能,本申请实施例在此不再赘述。
本申请实施例还提出了一种计算机可读存储介质,该计算机可读存储介质存储一个或多个程序,该一个或多个程序包括指令,该指令当被包括多个应用程序的电子设备执行时,能够使该电子设备执行图1所示实施例中目标检测装置执行的方法,并具体用于执行:
基于YOLO-v4构建基础模型,基础模型的主干网络包括多个残差块;设置与输出信息相关的第一剪枝条件;将各残差块中满足第一剪枝条件的卷积层作为第一卷积层,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层;对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型;将检测图像输入到目标检测模型中,得到目标检测结果。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/ 或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (14)

  1. 一种目标检测方法,其中,包括:
    基于YOLO-v4构建基础模型,所述基础模型的主干网络包括多个残差块;
    设置与输出信息相关的第一剪枝条件;
    将各残差块中满足所述第一剪枝条件的卷积层作为第一卷积层,将各残差块中不满足所述第一剪枝条件的卷积层作为第二卷积层;
    对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型;
    将检测图像输入到所述目标检测模型中,得到目标检测结果。
  2. 如权利要求1所述的方法,其中,所述将各残差块中不满足所述第一剪枝条件的卷积层作为第二卷积层包括:
    若一个卷积层的输出信息为残差块中残差结构的输入信息,则该卷积层不满足所述第一剪枝条件。
  3. 如权利要求1所述的方法,其中,所述对各第一卷积层进行剪枝包括:
    设置与卷积层位置相关的第二剪枝条件;
    判断各第一卷积层是否满足所述第二剪枝条件;
    若一个第一卷积层满足所述第二剪枝条件,则对该第一卷积层的输出通道进行剪枝。
  4. 如权利要求3所述的方法,其中,所述判断各第一卷积层是否满足所述第二剪枝条件包括:
    若一个第一卷积层为残差块中残差结构的最后一个卷积层,则该第一卷积层不满足所述第二剪枝条件,否则该第一卷积层满足所述第二剪枝条件。
  5. 如权利要求3所述的方法,其中,所述对该第一卷积层的输出通道进行剪枝包括:
    根据连接在该第一卷积层后的BN层的γ参数,对该第一卷积层的输出通道进行network slimming剪枝。
  6. 如权利要求1所述的方法,其中,所述对各第一卷积层进行剪枝包括:
    设置与输入信息相关的第三剪枝条件;
    判断各第一卷积层是否满足所述第三剪枝条件;
    若一个第一卷积层满足所述第三剪枝条件,则对该第一卷积层的输入通道进行剪枝。
  7. 如权利要求6所述的方法,其中,所述判断各第一卷积层是否满足所述第三剪枝条件包括:
    若一个第一卷积层接收的输入信息是所述第二卷积层的输出信息,则该第一卷积层不满足第三剪枝条件。
  8. 如权利要求6所述的方法,其中,所述判断各第一卷积层是否满足所述第三剪枝条件包括:
    若一个第一卷积层接收的输入信息是检测图像,则该第一卷积层不满足第三剪枝条件。
  9. 如权利要求6所述的方法,其中,所述判断各第一卷积层是否满足所述第三剪枝条件包括:
    若一个第一卷积层接收的输入信息是所述第二卷积层的输出信息与另一卷积层的输出信息进行异或计算的结果,则该第一卷积层不满足第三剪枝条件。
  10. 一种目标检测装置,其中,包括:
    构建单元,用于基于YOLO-v4构建基础模型,所述基础模型的主干网络包括多个残差块;
    剪枝单元,用于设置与输出信息相关的第一剪枝条件;将各残差块中满足所述第一剪枝条件的卷积层作为第一卷积层,将各残差块中不满足所述第一剪枝条件的卷积层作为第二卷积层;对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型;
    检测单元,用于将检测图像输入到所述目标检测模型中,得到目标检测结果。
  11. 如权利要求10所述的装置,其中,
    所述剪枝单元,用于若一个卷积层的输出信息为残差块中残差结构的输入信息,则该卷积层不满足所述第一剪枝条件。
  12. 如权利要求10所述的装置,其中,
    所述剪枝单元,用于设置与卷积层位置相关的第二剪枝条件;判断各第一卷积层是否满足第二剪枝条件;若一个第一卷积层满足第二剪枝条件,则对该第一卷积层的输出通道进行剪枝。
  13. 如权利要求10所述的装置,其中,
    所述剪枝单元,用于设置与输入信息相关的第三剪枝条件;判断各第一卷积层是否满足第三剪枝条件;若一个第一卷积层满足第三剪枝条件,则对该第一卷积层的输入通道进行剪枝。
  14. 一种电子设备,包括:处理器;以及被安排成存储计算机可执行指令的存储器,可执行指令在被执行时使处理器执行如下的目标检测方法:
    基于YOLO-v4构建基础模型,该基础模型的主干网络包括多个残差块;设置与输出信息相关的第一剪枝条件;将各残差块中满足第一剪枝条件的卷积层作为第一卷积层,将各残差块中不满足第一剪枝条件的卷积层作为第二卷积层;对各第一卷积层进行剪枝,对各第二卷积层不进行剪枝,得到目标检测模型;将检测图像输入到目标检测模型中,得到目标检测结果。
PCT/CN2021/130259 2021-02-03 2021-11-12 一种目标检测方法和装置 WO2022166294A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110148533.4A CN112836751A (zh) 2021-02-03 2021-02-03 一种目标检测方法和装置
CN202110148533.4 2021-02-03

Publications (1)

Publication Number Publication Date
WO2022166294A1 true WO2022166294A1 (zh) 2022-08-11

Family

ID=75931845

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/130259 WO2022166294A1 (zh) 2021-02-03 2021-11-12 一种目标检测方法和装置

Country Status (2)

Country Link
CN (1) CN112836751A (zh)
WO (1) WO2022166294A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775381A (zh) * 2022-12-15 2023-03-10 华洋通信科技股份有限公司 一种光照不均匀下的矿井电机车路况识别方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836751A (zh) * 2021-02-03 2021-05-25 歌尔股份有限公司 一种目标检测方法和装置
CN113705775A (zh) * 2021-07-29 2021-11-26 浪潮电子信息产业股份有限公司 一种神经网络的剪枝方法、装置、设备及存储介质
CN116468100B (zh) * 2023-03-06 2024-05-10 美的集团(上海)有限公司 残差剪枝方法、装置、电子设备和可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095795A1 (en) * 2017-03-15 2019-03-28 Samsung Electronics Co., Ltd. System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN111008640A (zh) * 2019-10-17 2020-04-14 平安科技(深圳)有限公司 图像识别模型训练及图像识别方法、装置、终端及介质
CN111768372A (zh) * 2020-06-12 2020-10-13 国网智能科技股份有限公司 一种gis设备腔体内部异物检测方法及系统
CN111931901A (zh) * 2020-07-02 2020-11-13 华为技术有限公司 一种神经网络构建方法以及装置
CN112836751A (zh) * 2021-02-03 2021-05-25 歌尔股份有限公司 一种目标检测方法和装置

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190362235A1 (en) * 2018-05-23 2019-11-28 Xiaofan Xu Hybrid neural network pruning
CN109460613A (zh) * 2018-11-12 2019-03-12 北京迈格威科技有限公司 模型裁剪方法及装置
US20200160185A1 (en) * 2018-11-21 2020-05-21 Nvidia Corporation Pruning neural networks that include element-wise operations
CN109740534B (zh) * 2018-12-29 2021-06-25 北京旷视科技有限公司 图像处理方法、装置及处理设备
CN110033083B (zh) * 2019-03-29 2023-08-29 腾讯科技(深圳)有限公司 卷积神经网络模型压缩方法和装置、存储介质及电子装置
CN110633747A (zh) * 2019-09-12 2019-12-31 网易(杭州)网络有限公司 目标检测器的压缩方法、装置、介质以及电子设备
CN111178133A (zh) * 2019-12-03 2020-05-19 哈尔滨工程大学 一种基于剪枝深度模型用于自然场景图像文本识别方法
CN111126501B (zh) * 2019-12-26 2022-09-16 厦门市美亚柏科信息股份有限公司 一种图像识别方法、终端设备及存储介质
CN111652366A (zh) * 2020-05-09 2020-09-11 哈尔滨工业大学 一种基于通道剪枝和量化训练的联合神经网络模型压缩方法
CN111652370A (zh) * 2020-05-28 2020-09-11 成都思晗科技股份有限公司 基于BatchNormal层优化的YOLO V3模型裁剪方法
CN112052951A (zh) * 2020-08-31 2020-12-08 北京中科慧眼科技有限公司 一种剪枝神经网络方法、系统、设备及可读存储介质
CN112070051B (zh) * 2020-09-16 2022-09-20 华东交通大学 基于剪枝压缩的疲劳驾驶快速检测方法
CN112308066A (zh) * 2020-10-23 2021-02-02 西安科锐盛创新科技有限公司 一种车牌识别系统
CN112257794B (zh) * 2020-10-27 2022-10-28 东南大学 一种基于yolo的轻量级的目标检测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095795A1 (en) * 2017-03-15 2019-03-28 Samsung Electronics Co., Ltd. System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN111008640A (zh) * 2019-10-17 2020-04-14 平安科技(深圳)有限公司 图像识别模型训练及图像识别方法、装置、终端及介质
CN111768372A (zh) * 2020-06-12 2020-10-13 国网智能科技股份有限公司 一种gis设备腔体内部异物检测方法及系统
CN111931901A (zh) * 2020-07-02 2020-11-13 华为技术有限公司 一种神经网络构建方法以及装置
CN112836751A (zh) * 2021-02-03 2021-05-25 歌尔股份有限公司 一种目标检测方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775381A (zh) * 2022-12-15 2023-03-10 华洋通信科技股份有限公司 一种光照不均匀下的矿井电机车路况识别方法
CN115775381B (zh) * 2022-12-15 2023-10-20 华洋通信科技股份有限公司 一种光照不均匀下的矿井电机车路况识别方法

Also Published As

Publication number Publication date
CN112836751A (zh) 2021-05-25

Similar Documents

Publication Publication Date Title
WO2022166294A1 (zh) 一种目标检测方法和装置
CN108460523B (zh) 一种风控规则生成方法和装置
CN112634209A (zh) 一种产品缺陷检测方法和装置
WO2022166293A1 (zh) 一种目标检测方法和装置
TW201928841A (zh) 訓練風控模型和風控的方法、裝置及設備
CN109961107B (zh) 目标检测模型的训练方法、装置、电子设备及存储介质
WO2021008119A1 (zh) 一种业务处理方法、装置及设备
CN109857984B (zh) 一种锅炉负荷率-效能曲线的回归方法和装置
WO2022160856A1 (zh) 一种分类网络及其实现方法和装置
WO2022116720A1 (zh) 目标检测方法、装置和电子设备
TW202006642A (zh) 風險防控方法、系統及終端設備
CN111198906A (zh) 一种数据处理方法、装置、系统及存储介质
WO2020177488A1 (zh) 一种区块链交易追溯的方法及装置
CN108280135B (zh) 实现数据结构可视化的方法、装置和电子设备
US20200294057A1 (en) Business processing method, apparatus, and equipment
CN109598478B (zh) 一种风测结果描述文案的生成方法、装置及电子设备
CN115658732A (zh) 一种sql语句的优化查询方法、装置、电子设备及介质
CN111027716A (zh) 一种负荷预测的方法及装置
CN113641708B (zh) 规则引擎的优化方法、数据匹配方法及装置、存储介质、终端
CN115858306A (zh) 一种基于事件流的微服务监控方法、终端设备及存储介质
EP4170524A1 (en) Information processing method and apparatus for batch stream fusion, and storage medium
CN107368281B (zh) 一种数据处理方法及装置
WO2021115304A1 (zh) 数据处理方法、装置、设备及存储介质
CN114817212A (zh) 一种数据库的优化方法及优化装置
CN109325127B (zh) 一种风险识别方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21924290

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21924290

Country of ref document: EP

Kind code of ref document: A1