CN112836751A

CN112836751A - Target detection method and device

Info

Publication number: CN112836751A
Application number: CN202110148533.4A
Authority: CN
Inventors: 张一凡; 刘杰
Original assignee: Goertek Inc
Current assignee: Goertek Inc
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2021-05-25
Also published as: WO2022166294A1

Abstract

The application discloses a target detection method and a target detection device. The method comprises the following steps: constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks; setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and inputting the detection image into a target detection model to obtain a target detection result. Compared with the model constructed by using the original YOLO-v4, the target detection model obtained by the technical scheme has smaller volume, can still keep higher target detection precision, and effectively reduces the calculated amount in the target detection process.

Description

Target detection method and device

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a target detection method and apparatus.

Background

YOLO (english is called youonly Look one, and there is no chinese name in the industry for a while) is a typical single-stage target detection technology, i.e., information such as the position and the category of a target is directly regressed according to an original image, and the fourth version, YOLO-v4, has been developed at present.

In practical application, a user often builds a target detection model based on YOLO-v4, and then adjusts a network structure of the target detection model according to actual requirements, and the adjustments may bring more calculation amount, so how to reduce the calculation amount in the target detection process is a problem to be solved.

It should be noted that the statements herein merely provide background information related to the present application and may not necessarily constitute prior art.

Disclosure of Invention

The embodiment of the application provides a target detection method and device, which aim to reduce the calculated amount in the target detection process.

The embodiment of the application adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a target detection method, including: constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks; setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and inputting the detection image into a target detection model to obtain a target detection result.

In some embodiments, the target detection method, wherein the taking, as the second convolutional layer, a convolutional layer that does not satisfy the first pruning condition among the residual blocks comprises: and if the output information of one convolutional layer is the input information of the residual structure in the residual block, the convolutional layer does not meet the first pruning condition.

In some embodiments, the pruning each first convolution layer in the object detection method includes: setting a second pruning condition related to the convolution layer position; judging whether each first winding layer meets a second pruning condition; if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.

In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the second pruning condition includes: and if the first convolution layer is the last convolution layer of the residual structure in the residual block, the first convolution layer does not meet the second pruning condition, otherwise, the first convolution layer meets the second pruning condition.

In some embodiments, the pruning the output channel of the first convolutional layer in the object detection method includes: and performing network slimming pruning on the output channel of the first convolution layer according to the gamma parameter of the BN layer connected to the first convolution layer.

In some embodiments, the pruning each first convolution layer in the object detection method includes: setting a third pruning condition related to the input information; judging whether each first convolution layer meets a third pruning condition; if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.

In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is the output information of a second convolutional layer, the first convolutional layer does not satisfy the third pruning condition.

In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is a detection image, the first convolutional layer does not satisfy the third pruning condition.

In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is the result of XOR calculation between the output information of the second convolutional layer and the output information of another convolutional layer, the first convolutional layer does not satisfy the third pruning condition.

In a second aspect, an embodiment of the present application further provides an object detection apparatus, including: the building unit is used for building a basic model based on YOLO-v4, and a backbone network of the basic model comprises a plurality of residual blocks; a pruning unit for setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and the detection unit is used for inputting the detection image into the target detection model to obtain a target detection result.

In some embodiments, in the target detection apparatus, the pruning unit is configured to, if the output information of a convolutional layer is the input information of the residual structure in the residual block, not satisfy the first pruning condition for the convolutional layer.

In some embodiments, in the object detection apparatus, a pruning unit for setting a second pruning condition with respect to a position of the convolution layer; judging whether each first winding layer meets a second pruning condition; if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.

In some embodiments, in the target detection apparatus, the pruning unit is configured to, if a first convolutional layer is a last convolutional layer of a residual structure in the residual block, not satisfy the second pruning condition, otherwise, satisfy the second pruning condition.

In some embodiments, in the object detection apparatus, the pruning unit is configured to perform network pruning on the output channel of the first convolutional layer according to a γ parameter of the BN layer connected after the first convolutional layer.

In some embodiments, in the object detection apparatus, the pruning unit is configured to set a third pruning condition related to the input information; judging whether each first convolution layer meets a third pruning condition; if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.

In some embodiments, in the object detection apparatus, the pruning unit is configured to, if the input information received by one first convolution layer is the output information of a second convolution layer, not satisfy the third pruning condition for the first convolution layer.

In some embodiments, in the object detection apparatus, the pruning unit is configured to, if the input information received by one of the first convolution layers is a detection image, not satisfy the third pruning condition for the first convolution layer.

In some embodiments, in the object detection apparatus, the pruning unit is configured to not satisfy the third pruning condition for a first convolutional layer if input information received by the first convolutional layer is a result of exclusive or calculation of output information of a second convolutional layer and output information of another convolutional layer.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method of object detection as described in any one of the above.

In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing one or more programs, which when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the object detection method as described in any one of the above.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: by setting a pruning mode related to output information, selective pruning is carried out on the convolution layer of the residual block in the basic model constructed based on the YOLO-v4 to obtain the target detection model, compared with the model constructed by using the original YOLO-v4, the target detection model has smaller volume, higher target detection precision can be still kept, and the calculated amount in the target detection process is effectively reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a schematic flow chart of a method for target detection according to an embodiment of the present application;

FIG. 2 is a diagram illustrating a residual block structure according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an object detection device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The pruning technique is widely applied in the field of neural networks. When the parameters of the neural network in the target detection model are numerous, but some of the parameters do not contribute much to the final output result and appear redundant, a pruning technique can be used, namely, the redundant parameters are pruned.

Although pruning can reduce the volume of the target detection model, if pruning is performed randomly, the target detection accuracy is reduced, and therefore, how to reasonably prune the target detection model needs to be considered.

The technical idea of the method is that a basic model is built based on YOLO-v4, the residual blocks of a main network of the basic model are selected as pruning objects, and selective pruning is carried out on the convolution layers of the residual blocks, so that the size of a target detection model is reduced, and meanwhile, high target detection precision can be kept.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present application, as shown in fig. 1, the method includes:

step S110, constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks.

Here, the network structure of the basic model may be adjusted as needed, for example, a number of detection branches may be added, the downsampling structure of the backbone network may be adjusted to reduce the number of downsampling, and the like, which is not limited in this application.

In some embodiments, the backbone network may include a plurality of residual blocks as shown in fig. 2. The residual block shown in fig. 2 includes 7 convolutional layers (conv), which are convolutional layer 210, convolutional layer 220, convolutional layer 230, convolutional layer 240, convolutional layer 250, convolutional layer 260, and convolutional layer 270, respectively. In addition, it is necessary to perform an exclusive-or operation in combination with the output result of the corresponding convolutional layer

And a splicing operation (concat).

Taking the topmost convolutional layer 210 as an example, 3 × 3 represents the size of the convolutional kernel used by

convolutional layer

210, and 128 in the parentheses represents the number of channels of convolutional layer 210. The remaining convolutional layers are not described in detail.

Step S120, setting a first pruning condition related to the output information; and taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer.

And step S130, pruning each first convolution layer, and not pruning each second convolution layer to obtain the target detection model.

Here, pruning may include pruning of input channels and/or pruning of output channels.

And step S140, inputting the detection image into a target detection model to obtain a target detection result. Specifically, the target of detection may be a defect of a vehicle, a pedestrian, or an industrial product, or the like.

It can be seen that, in the method shown in fig. 1, the convolution layer of the residual block in the basic model constructed based on YOLO-v4 is selectively pruned by setting a pruning mode related to the output information to obtain the target detection model, and compared with the model constructed by using the original YOLO-v4, the target detection model has a smaller volume, and can still maintain higher target detection accuracy, thereby effectively reducing the calculation amount in the target detection process.

The reason for this is that, in the base model constructed based on YOLO-v4, the residual block has a CSP (Cross Stage Partial) structure, and if the second convolution layer (whose output information is the input information of the residual structure in the residual block) is pruned, the number of channels output to the residual structure may change, which may affect the use of the residual structure.

It should be noted here that the output information of the second convolutional layer may be subjected to batch normalization before being used as the input information of the residual structure in the residual block, but is not subjected to convolution by other convolutional layers.

Taking fig. 2 as an example,

convolutional layers

240 and 250 and the following xor calculation constitute a residual structure. Since the output information of convolutional layer 230, shown in dashed box, is the input information of the residual structure, convolutional layer 230 is the only second convolutional layer in the residual block that is not pruned.

As described above, the residual block has the CSP structure, and the second pruning condition is set according to the convolution layer position, so that the residual structure can be reasonably used.

Taking fig. 2 as an example, convolutional layer 250 is the last convolutional layer of the residual structure in the residual block, and to ensure the accuracy of the subsequent xor calculation, the output channel of convolutional layer 250 is not pruned, and the output channels of other convolutional layers except convolutional layer 250 and convolutional layer 230 can be pruned.

Pruning is carried out in a number of ways and can be achieved by selecting any of the existing techniques. Preferably, in some embodiments, in the target detection method, pruning the output channel of the first convolution layer includes: and performing network slimming pruning on the output channel of the first convolution layer according to the gamma parameter of the BN layer connected to the first convolution layer.

Here, the basic model needs to use a BN (Batch Normalization) layer, and the basic model is first sparsified so that the γ parameter of each BN layer is sparsified, thereby satisfying the use condition of network slimming pruning. The specific network pruning operation can be implemented by referring to the prior art, and is not described herein again.

The input information here corresponds to the previous output information, with the difference that the output information is obtained by the convolutional layer itself, while the input information is received by the convolutional layer, which may be the output of other network structures, or the original picture, etc., and therefore, the input channels are pruned reasonably as appropriate.

Since the second convolutional layer is not pruned, accordingly, the convolutional layer receiving the output information of the second convolutional layer has an input channel that is not pruned to be consistent.

Of course, the detected image may be subjected to some preprocessing, that is, the input information is expressed by the tensor of the detected image. In these cases, the input channel is also not able to prune accordingly, since the input information is fixed.

In order to ensure the normal use of the residual structure, the number of channels of the result of the xor calculation between the output information of the second convolutional layer and the output information of the other convolutional layer is also a fixed value, so that when the channel is used as the input information of the first convolutional layer, the input channel of the first convolutional layer cannot be pruned.

To summarize, taking fig. 2 as an example:

the convolutional layer 210: if the input information is a detection image, the input channel can not be pruned; if the input information is not a detected image, the input channel may prune based on the input information. In either case, the number of output channels of convolutional layer 210 can be pruned according to the γ parameter of the following BN layer.

A convolutional layer 220: the number of input channels thereof may be pruned according to the output information of the convolutional layer 210, and the number of output channels thereof may be pruned according to the γ parameter of the subsequent BN layer.

The convolutional layer 230: the number of input channels and the number of output channels are not pruned.

The convolutional layer 240: the number of input channels is not pruned, and the number of output channels can be pruned according to the gamma parameter of the subsequent BN layer.

Convolutional layer 250: the number of input channels may be pruned based on the output information of convolutional layer 240, and the number of output channels may not be pruned.

Convolutional layer 260: the number of input channels is not pruned, and the number of output channels can be pruned according to the gamma parameter of the subsequent BN layer.

Convolutional layer 270: the number of input channels is pruned according to the cascade output of convolutional layer 220 and convolutional layer 260, and the number of output channels can be pruned according to the gamma parameter of the subsequent BN layer.

The target detection model 1 is obtained by taking the pruning of the convolutional layers 210 to 270 as an example, the target detection model 2 is constructed by using the original YOLO-v4, and the same sample set training and test set testing are utilized, so that the target detection model 1 is greatly reduced in volume relative to the target detection model 2, and is slightly improved in two indexes of average detection accuracy (mAP) and detection accuracy (precision), which indicates that the model performance is not reduced by the pruning, even the performance is improved, and the unobvious effect is achieved.

In addition, an embodiment of the present application further provides an object detection apparatus, which is used for implementing the object detection method as described in any one of the above.

Fig. 3 shows a schematic structural diagram of an object detection apparatus according to an embodiment of the present application. As shown in fig. 3, the object detection device 300 includes:

a building unit 310, configured to build a base model based on YOLO-v4, where a backbone network of the base model includes a plurality of residual blocks.

In some embodiments, the backbone network may include a plurality of residual blocks as shown in fig. 2. The residual block shown in fig. 2 includes 7 convolutional layers (conv), which are convolutional layer 210, convolutional layer 220, convolutional layer 230, convolutional layer 240, convolutional layer 250, convolutional layer 260, and convolutional layer 270, respectively. Taking the topmost convolutional layer 210 as an example, 3 × 3 represents the size of the convolutional kernel used by

convolutional layer

A pruning unit 320 for setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; and pruning each first convolution layer, and not pruning each second convolution layer to obtain the target detection model.

The detecting unit 330 is configured to input the detected image into the target detection model to obtain a target detection result.

It can be seen that, in the apparatus shown in fig. 3, the convolution layer of the residual block in the basic model constructed based on YOLO-v4 is selectively pruned by setting a pruning method related to the output information to obtain the target detection model, and compared with the model constructed by using the original YOLO-v4, the volume of the target detection model is smaller, and higher target detection accuracy can still be maintained, thereby effectively reducing the calculation amount in the target detection process.

In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to, if the output information of a convolutional layer is the input information of the residual structure in the residual block, not satisfy the first pruning condition for the convolutional layer.

In some embodiments, in the object detection apparatus, the pruning unit 320 is configured to set a second pruning condition related to the convolution layer position; judging whether each first winding layer meets a second pruning condition; if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.

In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to, if a first convolutional layer is the last convolutional layer of the residual structure in the residual block, not satisfy the second pruning condition, otherwise, the first convolutional layer satisfies the second pruning condition.

In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to perform network pruning on the output channel of the first convolutional layer according to a γ parameter of the BN layer connected after the first convolutional layer.

In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to set a third pruning condition related to the input information; judging whether each first convolution layer meets a third pruning condition; if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.

In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to, if the input information received by a first convolutional layer is the output information of a second convolutional layer, not satisfy the third pruning condition for the first convolutional layer.

In some embodiments, in the object detection apparatus, the pruning unit 320 is configured to, if the input information received by one first convolution layer is a detection image, not satisfy the third pruning condition for the first convolution layer.

In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to not satisfy the third pruning condition for a first convolutional layer if the input information received by the first convolutional layer is the result of performing xor calculation on the output information of the second convolutional layer and the output information of another convolutional layer.

It can be understood that the target detection apparatus can implement the steps of the target detection method provided in the foregoing embodiment, and the related explanations about the target detection method are applicable to the target detection apparatus, and are not described herein again.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 4, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the target detection device on a logic level. The object detection means shown in fig. 4 do not constitute a limitation of the present application in number. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks; setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and inputting the detection image into a target detection model to obtain a target detection result.

The method performed by the object detection device according to the embodiment shown in fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further execute the method executed by the target detection apparatus in fig. 1, and implement the function of the target detection apparatus in the embodiment shown in fig. 3, which is not described herein again in this embodiment of the present application.

An embodiment of the present application further provides a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which, when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the method performed by the object detection apparatus in the embodiment shown in fig. 1, and are specifically configured to perform:

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of object detection, comprising:

constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks;

setting a first pruning condition related to the output information;

taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer;

pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model;

and inputting the detection image into the target detection model to obtain a target detection result.

2. The method of claim 1, wherein the taking convolutional layers of the residual blocks that do not satisfy the first pruning condition as second convolutional layers comprises:

and if the output information of one convolutional layer is the input information of the residual structure in the residual block, the convolutional layer does not meet the first pruning condition.

3. The method of claim 1, wherein pruning each first convolutional layer comprises:

setting a second pruning condition related to the convolution layer position;

judging whether each first coiling layer meets the second pruning condition or not;

if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.

4. The method of claim 3, wherein the determining whether each first convolution layer satisfies the second pruning condition comprises:

and if the first convolution layer is the last convolution layer of the residual structure in the residual block, the first convolution layer does not meet the second pruning condition, otherwise, the first convolution layer meets the second pruning condition.

5. The method of claim 3, wherein pruning the output channel of the first convolutional layer comprises:

and performing network slimming pruning on the output channel of the first convolution layer according to the gamma parameter of the BN layer connected to the first convolution layer.

6. The method of claim 1, wherein pruning each first convolutional layer comprises:

setting a third pruning condition related to the input information;

judging whether each first convolution layer meets the third pruning condition or not;

if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.

7. The method of claim 6, wherein the determining whether each first convolution layer satisfies the third pruning condition comprises:

if the input information received by a first convolutional layer is the output information of the second convolutional layer, the first convolutional layer does not satisfy the third pruning condition.

8. The method of claim 6, wherein the determining whether each first convolution layer satisfies the third pruning condition comprises:

if the input information received by a first convolutional layer is a detection image, the first convolutional layer does not satisfy the third pruning condition.

9. The method of claim 6, wherein the determining whether each first convolution layer satisfies the third pruning condition comprises:

and if the input information received by one first convolution layer is the result of the XOR calculation of the output information of the second convolution layer and the output information of the other convolution layer, the first convolution layer does not satisfy the third pruning condition.

10. An object detection device, comprising:

a building unit, configured to build a base model based on YOLO-v4, where a backbone network of the base model includes a plurality of residual blocks;

a pruning unit for setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model;

and the detection unit is used for inputting the detection image into the target detection model to obtain a target detection result.