CN112836751A - Target detection method and device - Google Patents

Target detection method and device Download PDF

Info

Publication number
CN112836751A
CN112836751A CN202110148533.4A CN202110148533A CN112836751A CN 112836751 A CN112836751 A CN 112836751A CN 202110148533 A CN202110148533 A CN 202110148533A CN 112836751 A CN112836751 A CN 112836751A
Authority
CN
China
Prior art keywords
pruning
convolution layer
layer
target detection
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110148533.4A
Other languages
Chinese (zh)
Inventor
张一凡
刘杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Goertek Inc
Original Assignee
Goertek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Goertek Inc filed Critical Goertek Inc
Priority to CN202110148533.4A priority Critical patent/CN112836751A/en
Publication of CN112836751A publication Critical patent/CN112836751A/en
Priority to PCT/CN2021/130259 priority patent/WO2022166294A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a target detection method and a target detection device. The method comprises the following steps: constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks; setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and inputting the detection image into a target detection model to obtain a target detection result. Compared with the model constructed by using the original YOLO-v4, the target detection model obtained by the technical scheme has smaller volume, can still keep higher target detection precision, and effectively reduces the calculated amount in the target detection process.

Description

Target detection method and device
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a target detection method and apparatus.
Background
YOLO (english is called youonly Look one, and there is no chinese name in the industry for a while) is a typical single-stage target detection technology, i.e., information such as the position and the category of a target is directly regressed according to an original image, and the fourth version, YOLO-v4, has been developed at present.
In practical application, a user often builds a target detection model based on YOLO-v4, and then adjusts a network structure of the target detection model according to actual requirements, and the adjustments may bring more calculation amount, so how to reduce the calculation amount in the target detection process is a problem to be solved.
It should be noted that the statements herein merely provide background information related to the present application and may not necessarily constitute prior art.
Disclosure of Invention
The embodiment of the application provides a target detection method and device, which aim to reduce the calculated amount in the target detection process.
The embodiment of the application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a target detection method, including: constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks; setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and inputting the detection image into a target detection model to obtain a target detection result.
In some embodiments, the target detection method, wherein the taking, as the second convolutional layer, a convolutional layer that does not satisfy the first pruning condition among the residual blocks comprises: and if the output information of one convolutional layer is the input information of the residual structure in the residual block, the convolutional layer does not meet the first pruning condition.
In some embodiments, the pruning each first convolution layer in the object detection method includes: setting a second pruning condition related to the convolution layer position; judging whether each first winding layer meets a second pruning condition; if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.
In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the second pruning condition includes: and if the first convolution layer is the last convolution layer of the residual structure in the residual block, the first convolution layer does not meet the second pruning condition, otherwise, the first convolution layer meets the second pruning condition.
In some embodiments, the pruning the output channel of the first convolutional layer in the object detection method includes: and performing network slimming pruning on the output channel of the first convolution layer according to the gamma parameter of the BN layer connected to the first convolution layer.
In some embodiments, the pruning each first convolution layer in the object detection method includes: setting a third pruning condition related to the input information; judging whether each first convolution layer meets a third pruning condition; if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.
In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is the output information of a second convolutional layer, the first convolutional layer does not satisfy the third pruning condition.
In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is a detection image, the first convolutional layer does not satisfy the third pruning condition.
In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is the result of XOR calculation between the output information of the second convolutional layer and the output information of another convolutional layer, the first convolutional layer does not satisfy the third pruning condition.
In a second aspect, an embodiment of the present application further provides an object detection apparatus, including: the building unit is used for building a basic model based on YOLO-v4, and a backbone network of the basic model comprises a plurality of residual blocks; a pruning unit for setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and the detection unit is used for inputting the detection image into the target detection model to obtain a target detection result.
In some embodiments, in the target detection apparatus, the pruning unit is configured to, if the output information of a convolutional layer is the input information of the residual structure in the residual block, not satisfy the first pruning condition for the convolutional layer.
In some embodiments, in the object detection apparatus, a pruning unit for setting a second pruning condition with respect to a position of the convolution layer; judging whether each first winding layer meets a second pruning condition; if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.
In some embodiments, in the target detection apparatus, the pruning unit is configured to, if a first convolutional layer is a last convolutional layer of a residual structure in the residual block, not satisfy the second pruning condition, otherwise, satisfy the second pruning condition.
In some embodiments, in the object detection apparatus, the pruning unit is configured to perform network pruning on the output channel of the first convolutional layer according to a γ parameter of the BN layer connected after the first convolutional layer.
In some embodiments, in the object detection apparatus, the pruning unit is configured to set a third pruning condition related to the input information; judging whether each first convolution layer meets a third pruning condition; if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.
In some embodiments, in the object detection apparatus, the pruning unit is configured to, if the input information received by one first convolution layer is the output information of a second convolution layer, not satisfy the third pruning condition for the first convolution layer.
In some embodiments, in the object detection apparatus, the pruning unit is configured to, if the input information received by one of the first convolution layers is a detection image, not satisfy the third pruning condition for the first convolution layer.
In some embodiments, in the object detection apparatus, the pruning unit is configured to not satisfy the third pruning condition for a first convolutional layer if input information received by the first convolutional layer is a result of exclusive or calculation of output information of a second convolutional layer and output information of another convolutional layer.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method of object detection as described in any one of the above.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing one or more programs, which when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the object detection method as described in any one of the above.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: by setting a pruning mode related to output information, selective pruning is carried out on the convolution layer of the residual block in the basic model constructed based on the YOLO-v4 to obtain the target detection model, compared with the model constructed by using the original YOLO-v4, the target detection model has smaller volume, higher target detection precision can be still kept, and the calculated amount in the target detection process is effectively reduced.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic flow chart of a method for target detection according to an embodiment of the present application;
FIG. 2 is a diagram illustrating a residual block structure according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an object detection device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The pruning technique is widely applied in the field of neural networks. When the parameters of the neural network in the target detection model are numerous, but some of the parameters do not contribute much to the final output result and appear redundant, a pruning technique can be used, namely, the redundant parameters are pruned.
Although pruning can reduce the volume of the target detection model, if pruning is performed randomly, the target detection accuracy is reduced, and therefore, how to reasonably prune the target detection model needs to be considered.
The technical idea of the method is that a basic model is built based on YOLO-v4, the residual blocks of a main network of the basic model are selected as pruning objects, and selective pruning is carried out on the convolution layers of the residual blocks, so that the size of a target detection model is reduced, and meanwhile, high target detection precision can be kept.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a target detection method according to an embodiment of the present application, as shown in fig. 1, the method includes:
step S110, constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks.
Here, the network structure of the basic model may be adjusted as needed, for example, a number of detection branches may be added, the downsampling structure of the backbone network may be adjusted to reduce the number of downsampling, and the like, which is not limited in this application.
In some embodiments, the backbone network may include a plurality of residual blocks as shown in fig. 2. The residual block shown in fig. 2 includes 7 convolutional layers (conv), which are convolutional layer 210, convolutional layer 220, convolutional layer 230, convolutional layer 240, convolutional layer 250, convolutional layer 260, and convolutional layer 270, respectively. In addition, it is necessary to perform an exclusive-or operation in combination with the output result of the corresponding convolutional layer
Figure BDA0002931185590000051
And a splicing operation (concat).
Taking the topmost convolutional layer 210 as an example, 3 × 3 represents the size of the convolutional kernel used by convolutional layer 210, and 128 in the parentheses represents the number of channels of convolutional layer 210. The remaining convolutional layers are not described in detail.
Step S120, setting a first pruning condition related to the output information; and taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer.
And step S130, pruning each first convolution layer, and not pruning each second convolution layer to obtain the target detection model.
Here, pruning may include pruning of input channels and/or pruning of output channels.
And step S140, inputting the detection image into a target detection model to obtain a target detection result. Specifically, the target of detection may be a defect of a vehicle, a pedestrian, or an industrial product, or the like.
It can be seen that, in the method shown in fig. 1, the convolution layer of the residual block in the basic model constructed based on YOLO-v4 is selectively pruned by setting a pruning mode related to the output information to obtain the target detection model, and compared with the model constructed by using the original YOLO-v4, the target detection model has a smaller volume, and can still maintain higher target detection accuracy, thereby effectively reducing the calculation amount in the target detection process.
In some embodiments, the target detection method, wherein the taking, as the second convolutional layer, a convolutional layer that does not satisfy the first pruning condition among the residual blocks comprises: and if the output information of one convolutional layer is the input information of the residual structure in the residual block, the convolutional layer does not meet the first pruning condition.
The reason for this is that, in the base model constructed based on YOLO-v4, the residual block has a CSP (Cross Stage Partial) structure, and if the second convolution layer (whose output information is the input information of the residual structure in the residual block) is pruned, the number of channels output to the residual structure may change, which may affect the use of the residual structure.
It should be noted here that the output information of the second convolutional layer may be subjected to batch normalization before being used as the input information of the residual structure in the residual block, but is not subjected to convolution by other convolutional layers.
Taking fig. 2 as an example, convolutional layers 240 and 250 and the following xor calculation constitute a residual structure. Since the output information of convolutional layer 230, shown in dashed box, is the input information of the residual structure, convolutional layer 230 is the only second convolutional layer in the residual block that is not pruned.
In some embodiments, the pruning each first convolution layer in the object detection method includes: setting a second pruning condition related to the convolution layer position; judging whether each first winding layer meets a second pruning condition; if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.
As described above, the residual block has the CSP structure, and the second pruning condition is set according to the convolution layer position, so that the residual structure can be reasonably used.
In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the second pruning condition includes: and if the first convolution layer is the last convolution layer of the residual structure in the residual block, the first convolution layer does not meet the second pruning condition, otherwise, the first convolution layer meets the second pruning condition.
Taking fig. 2 as an example, convolutional layer 250 is the last convolutional layer of the residual structure in the residual block, and to ensure the accuracy of the subsequent xor calculation, the output channel of convolutional layer 250 is not pruned, and the output channels of other convolutional layers except convolutional layer 250 and convolutional layer 230 can be pruned.
Pruning is carried out in a number of ways and can be achieved by selecting any of the existing techniques. Preferably, in some embodiments, in the target detection method, pruning the output channel of the first convolution layer includes: and performing network slimming pruning on the output channel of the first convolution layer according to the gamma parameter of the BN layer connected to the first convolution layer.
Here, the basic model needs to use a BN (Batch Normalization) layer, and the basic model is first sparsified so that the γ parameter of each BN layer is sparsified, thereby satisfying the use condition of network slimming pruning. The specific network pruning operation can be implemented by referring to the prior art, and is not described herein again.
In some embodiments, the pruning each first convolution layer in the object detection method includes: setting a third pruning condition related to the input information; judging whether each first convolution layer meets a third pruning condition; if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.
The input information here corresponds to the previous output information, with the difference that the output information is obtained by the convolutional layer itself, while the input information is received by the convolutional layer, which may be the output of other network structures, or the original picture, etc., and therefore, the input channels are pruned reasonably as appropriate.
In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is the output information of a second convolutional layer, the first convolutional layer does not satisfy the third pruning condition.
Since the second convolutional layer is not pruned, accordingly, the convolutional layer receiving the output information of the second convolutional layer has an input channel that is not pruned to be consistent.
In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is a detection image, the first convolutional layer does not satisfy the third pruning condition.
Of course, the detected image may be subjected to some preprocessing, that is, the input information is expressed by the tensor of the detected image. In these cases, the input channel is also not able to prune accordingly, since the input information is fixed.
In some embodiments, the target detection method, wherein determining whether each first convolution layer satisfies the third pruning condition includes: if the input information received by a first convolutional layer is the result of XOR calculation between the output information of the second convolutional layer and the output information of another convolutional layer, the first convolutional layer does not satisfy the third pruning condition.
In order to ensure the normal use of the residual structure, the number of channels of the result of the xor calculation between the output information of the second convolutional layer and the output information of the other convolutional layer is also a fixed value, so that when the channel is used as the input information of the first convolutional layer, the input channel of the first convolutional layer cannot be pruned.
To summarize, taking fig. 2 as an example:
the convolutional layer 210: if the input information is a detection image, the input channel can not be pruned; if the input information is not a detected image, the input channel may prune based on the input information. In either case, the number of output channels of convolutional layer 210 can be pruned according to the γ parameter of the following BN layer.
A convolutional layer 220: the number of input channels thereof may be pruned according to the output information of the convolutional layer 210, and the number of output channels thereof may be pruned according to the γ parameter of the subsequent BN layer.
The convolutional layer 230: the number of input channels and the number of output channels are not pruned.
The convolutional layer 240: the number of input channels is not pruned, and the number of output channels can be pruned according to the gamma parameter of the subsequent BN layer.
Convolutional layer 250: the number of input channels may be pruned based on the output information of convolutional layer 240, and the number of output channels may not be pruned.
Convolutional layer 260: the number of input channels is not pruned, and the number of output channels can be pruned according to the gamma parameter of the subsequent BN layer.
Convolutional layer 270: the number of input channels is pruned according to the cascade output of convolutional layer 220 and convolutional layer 260, and the number of output channels can be pruned according to the gamma parameter of the subsequent BN layer.
The target detection model 1 is obtained by taking the pruning of the convolutional layers 210 to 270 as an example, the target detection model 2 is constructed by using the original YOLO-v4, and the same sample set training and test set testing are utilized, so that the target detection model 1 is greatly reduced in volume relative to the target detection model 2, and is slightly improved in two indexes of average detection accuracy (mAP) and detection accuracy (precision), which indicates that the model performance is not reduced by the pruning, even the performance is improved, and the unobvious effect is achieved.
In addition, an embodiment of the present application further provides an object detection apparatus, which is used for implementing the object detection method as described in any one of the above.
Fig. 3 shows a schematic structural diagram of an object detection apparatus according to an embodiment of the present application. As shown in fig. 3, the object detection device 300 includes:
a building unit 310, configured to build a base model based on YOLO-v4, where a backbone network of the base model includes a plurality of residual blocks.
Here, the network structure of the basic model may be adjusted as needed, for example, a number of detection branches may be added, the downsampling structure of the backbone network may be adjusted to reduce the number of downsampling, and the like, which is not limited in this application.
In some embodiments, the backbone network may include a plurality of residual blocks as shown in fig. 2. The residual block shown in fig. 2 includes 7 convolutional layers (conv), which are convolutional layer 210, convolutional layer 220, convolutional layer 230, convolutional layer 240, convolutional layer 250, convolutional layer 260, and convolutional layer 270, respectively. Taking the topmost convolutional layer 210 as an example, 3 × 3 represents the size of the convolutional kernel used by convolutional layer 210, and 128 in the parentheses represents the number of channels of convolutional layer 210. The remaining convolutional layers are not described in detail.
A pruning unit 320 for setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; and pruning each first convolution layer, and not pruning each second convolution layer to obtain the target detection model.
Here, pruning may include pruning of input channels and/or pruning of output channels.
The detecting unit 330 is configured to input the detected image into the target detection model to obtain a target detection result.
It can be seen that, in the apparatus shown in fig. 3, the convolution layer of the residual block in the basic model constructed based on YOLO-v4 is selectively pruned by setting a pruning method related to the output information to obtain the target detection model, and compared with the model constructed by using the original YOLO-v4, the volume of the target detection model is smaller, and higher target detection accuracy can still be maintained, thereby effectively reducing the calculation amount in the target detection process.
In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to, if the output information of a convolutional layer is the input information of the residual structure in the residual block, not satisfy the first pruning condition for the convolutional layer.
In some embodiments, in the object detection apparatus, the pruning unit 320 is configured to set a second pruning condition related to the convolution layer position; judging whether each first winding layer meets a second pruning condition; if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.
In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to, if a first convolutional layer is the last convolutional layer of the residual structure in the residual block, not satisfy the second pruning condition, otherwise, the first convolutional layer satisfies the second pruning condition.
In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to perform network pruning on the output channel of the first convolutional layer according to a γ parameter of the BN layer connected after the first convolutional layer.
In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to set a third pruning condition related to the input information; judging whether each first convolution layer meets a third pruning condition; if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.
In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to, if the input information received by a first convolutional layer is the output information of a second convolutional layer, not satisfy the third pruning condition for the first convolutional layer.
In some embodiments, in the object detection apparatus, the pruning unit 320 is configured to, if the input information received by one first convolution layer is a detection image, not satisfy the third pruning condition for the first convolution layer.
In some embodiments, in the target detection apparatus, the pruning unit 320 is configured to not satisfy the third pruning condition for a first convolutional layer if the input information received by the first convolutional layer is the result of performing xor calculation on the output information of the second convolutional layer and the output information of another convolutional layer.
It can be understood that the target detection apparatus can implement the steps of the target detection method provided in the foregoing embodiment, and the related explanations about the target detection method are applicable to the target detection apparatus, and are not described herein again.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 4, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the target detection device on a logic level. The object detection means shown in fig. 4 do not constitute a limitation of the present application in number. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks; setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and inputting the detection image into a target detection model to obtain a target detection result.
The method performed by the object detection device according to the embodiment shown in fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may further execute the method executed by the target detection apparatus in fig. 1, and implement the function of the target detection apparatus in the embodiment shown in fig. 3, which is not described herein again in this embodiment of the present application.
An embodiment of the present application further provides a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which, when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the method performed by the object detection apparatus in the embodiment shown in fig. 1, and are specifically configured to perform:
constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks; setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model; and inputting the detection image into a target detection model to obtain a target detection result.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method of object detection, comprising:
constructing a basic model based on YOLO-v4, wherein a backbone network of the basic model comprises a plurality of residual blocks;
setting a first pruning condition related to the output information;
taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer;
pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model;
and inputting the detection image into the target detection model to obtain a target detection result.
2. The method of claim 1, wherein the taking convolutional layers of the residual blocks that do not satisfy the first pruning condition as second convolutional layers comprises:
and if the output information of one convolutional layer is the input information of the residual structure in the residual block, the convolutional layer does not meet the first pruning condition.
3. The method of claim 1, wherein pruning each first convolutional layer comprises:
setting a second pruning condition related to the convolution layer position;
judging whether each first coiling layer meets the second pruning condition or not;
if a first convolution layer meets the second pruning condition, pruning the output channel of the first convolution layer.
4. The method of claim 3, wherein the determining whether each first convolution layer satisfies the second pruning condition comprises:
and if the first convolution layer is the last convolution layer of the residual structure in the residual block, the first convolution layer does not meet the second pruning condition, otherwise, the first convolution layer meets the second pruning condition.
5. The method of claim 3, wherein pruning the output channel of the first convolutional layer comprises:
and performing network slimming pruning on the output channel of the first convolution layer according to the gamma parameter of the BN layer connected to the first convolution layer.
6. The method of claim 1, wherein pruning each first convolutional layer comprises:
setting a third pruning condition related to the input information;
judging whether each first convolution layer meets the third pruning condition or not;
if a first convolution layer meets the third pruning condition, pruning the input channel of the first convolution layer.
7. The method of claim 6, wherein the determining whether each first convolution layer satisfies the third pruning condition comprises:
if the input information received by a first convolutional layer is the output information of the second convolutional layer, the first convolutional layer does not satisfy the third pruning condition.
8. The method of claim 6, wherein the determining whether each first convolution layer satisfies the third pruning condition comprises:
if the input information received by a first convolutional layer is a detection image, the first convolutional layer does not satisfy the third pruning condition.
9. The method of claim 6, wherein the determining whether each first convolution layer satisfies the third pruning condition comprises:
and if the input information received by one first convolution layer is the result of the XOR calculation of the output information of the second convolution layer and the output information of the other convolution layer, the first convolution layer does not satisfy the third pruning condition.
10. An object detection device, comprising:
a building unit, configured to build a base model based on YOLO-v4, where a backbone network of the base model includes a plurality of residual blocks;
a pruning unit for setting a first pruning condition related to the output information; taking the convolution layer which meets the first pruning condition in each residual block as a first convolution layer, and taking the convolution layer which does not meet the first pruning condition in each residual block as a second convolution layer; pruning each first convolution layer, and not pruning each second convolution layer to obtain a target detection model;
and the detection unit is used for inputting the detection image into the target detection model to obtain a target detection result.
CN202110148533.4A 2021-02-03 2021-02-03 Target detection method and device Pending CN112836751A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110148533.4A CN112836751A (en) 2021-02-03 2021-02-03 Target detection method and device
PCT/CN2021/130259 WO2022166294A1 (en) 2021-02-03 2021-11-12 Target detection method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110148533.4A CN112836751A (en) 2021-02-03 2021-02-03 Target detection method and device

Publications (1)

Publication Number Publication Date
CN112836751A true CN112836751A (en) 2021-05-25

Family

ID=75931845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110148533.4A Pending CN112836751A (en) 2021-02-03 2021-02-03 Target detection method and device

Country Status (2)

Country Link
CN (1) CN112836751A (en)
WO (1) WO2022166294A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705775A (en) * 2021-07-29 2021-11-26 浪潮电子信息产业股份有限公司 Neural network pruning method, device, equipment and storage medium
WO2022166294A1 (en) * 2021-02-03 2022-08-11 歌尔股份有限公司 Target detection method and apparatus
CN116468100A (en) * 2023-03-06 2023-07-21 美的集团(上海)有限公司 Residual pruning method, residual pruning device, electronic equipment and readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775381B (en) * 2022-12-15 2023-10-20 华洋通信科技股份有限公司 Mine electric locomotive road condition identification method under uneven illumination

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460613A (en) * 2018-11-12 2019-03-12 北京迈格威科技有限公司 Model method of cutting out and device
CN109740534A (en) * 2018-12-29 2019-05-10 北京旷视科技有限公司 Image processing method, device and processing equipment
CN110033083A (en) * 2019-03-29 2019-07-19 腾讯科技(深圳)有限公司 Convolutional neural networks model compression method and apparatus, storage medium and electronic device
US20190362235A1 (en) * 2018-05-23 2019-11-28 Xiaofan Xu Hybrid neural network pruning
CN110633747A (en) * 2019-09-12 2019-12-31 网易(杭州)网络有限公司 Compression method, device, medium and electronic device for target detector
CN111126501A (en) * 2019-12-26 2020-05-08 厦门市美亚柏科信息股份有限公司 Image identification method, terminal equipment and storage medium
CN111178133A (en) * 2019-12-03 2020-05-19 哈尔滨工程大学 Natural scene image text recognition method based on pruning depth model
CN111210016A (en) * 2018-11-21 2020-05-29 辉达公司 Pruning a neural network containing element-level operations
CN111652370A (en) * 2020-05-28 2020-09-11 成都思晗科技股份有限公司 BatchNormal layer optimization-based YOLO V3 model clipping method
CN111652366A (en) * 2020-05-09 2020-09-11 哈尔滨工业大学 Combined neural network model compression method based on channel pruning and quantitative training
CN111768372A (en) * 2020-06-12 2020-10-13 国网智能科技股份有限公司 Method and system for detecting foreign matters in GIS equipment cavity
CN112052951A (en) * 2020-08-31 2020-12-08 北京中科慧眼科技有限公司 Pruning neural network method, system, equipment and readable storage medium
CN112070051A (en) * 2020-09-16 2020-12-11 华东交通大学 Pruning compression-based fatigue driving rapid detection method
CN112257794A (en) * 2020-10-27 2021-01-22 东南大学 YOLO-based lightweight target detection method
CN112308066A (en) * 2020-10-23 2021-02-02 西安科锐盛创新科技有限公司 License plate recognition system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11354577B2 (en) * 2017-03-15 2022-06-07 Samsung Electronics Co., Ltd System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN111008640B (en) * 2019-10-17 2024-03-19 平安科技(深圳)有限公司 Image recognition model training and image recognition method, device, terminal and medium
CN111931901A (en) * 2020-07-02 2020-11-13 华为技术有限公司 Neural network construction method and device
CN112836751A (en) * 2021-02-03 2021-05-25 歌尔股份有限公司 Target detection method and device

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190362235A1 (en) * 2018-05-23 2019-11-28 Xiaofan Xu Hybrid neural network pruning
CN109460613A (en) * 2018-11-12 2019-03-12 北京迈格威科技有限公司 Model method of cutting out and device
CN111210016A (en) * 2018-11-21 2020-05-29 辉达公司 Pruning a neural network containing element-level operations
CN109740534A (en) * 2018-12-29 2019-05-10 北京旷视科技有限公司 Image processing method, device and processing equipment
CN110033083A (en) * 2019-03-29 2019-07-19 腾讯科技(深圳)有限公司 Convolutional neural networks model compression method and apparatus, storage medium and electronic device
CN110633747A (en) * 2019-09-12 2019-12-31 网易(杭州)网络有限公司 Compression method, device, medium and electronic device for target detector
CN111178133A (en) * 2019-12-03 2020-05-19 哈尔滨工程大学 Natural scene image text recognition method based on pruning depth model
CN111126501A (en) * 2019-12-26 2020-05-08 厦门市美亚柏科信息股份有限公司 Image identification method, terminal equipment and storage medium
CN111652366A (en) * 2020-05-09 2020-09-11 哈尔滨工业大学 Combined neural network model compression method based on channel pruning and quantitative training
CN111652370A (en) * 2020-05-28 2020-09-11 成都思晗科技股份有限公司 BatchNormal layer optimization-based YOLO V3 model clipping method
CN111768372A (en) * 2020-06-12 2020-10-13 国网智能科技股份有限公司 Method and system for detecting foreign matters in GIS equipment cavity
CN112052951A (en) * 2020-08-31 2020-12-08 北京中科慧眼科技有限公司 Pruning neural network method, system, equipment and readable storage medium
CN112070051A (en) * 2020-09-16 2020-12-11 华东交通大学 Pruning compression-based fatigue driving rapid detection method
CN112308066A (en) * 2020-10-23 2021-02-02 西安科锐盛创新科技有限公司 License plate recognition system
CN112257794A (en) * 2020-10-27 2021-01-22 东南大学 YOLO-based lightweight target detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HONGLI LIU,HUIHUI BAI,FEIRAN JIE,MENGMENG ZHANG: "《Channel pruning for object detection network》", 《IET 8TH INTERNATIONAL CONFERENCE ON WIRELESS, MOBILE & MULTIMEDIA NETWORKS》 *
刘祥龙等编著: "《飞桨PaddlePaddle深度学习实战》", 31 August 2020 *
白士磊,殷柯欣,朱建启: "《轻量级YOLOv3的交通标志检测算法》", 《计算机与现代化》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022166294A1 (en) * 2021-02-03 2022-08-11 歌尔股份有限公司 Target detection method and apparatus
CN113705775A (en) * 2021-07-29 2021-11-26 浪潮电子信息产业股份有限公司 Neural network pruning method, device, equipment and storage medium
CN116468100A (en) * 2023-03-06 2023-07-21 美的集团(上海)有限公司 Residual pruning method, residual pruning device, electronic equipment and readable storage medium
CN116468100B (en) * 2023-03-06 2024-05-10 美的集团(上海)有限公司 Residual pruning method, residual pruning device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
WO2022166294A1 (en) 2022-08-11

Similar Documents

Publication Publication Date Title
CN112836751A (en) Target detection method and device
CN108460523B (en) Wind control rule generation method and device
CN108845936B (en) AB testing method and system based on massive users
CN112634209A (en) Product defect detection method and device
CN108846749B (en) Partitioned transaction execution system and method based on block chain technology
CN110222936B (en) Root cause positioning method and system of business scene and electronic equipment
CN112949692A (en) Target detection method and device
CN109299276B (en) Method and device for converting text into word embedding and text classification
CN112598321B (en) Risk prevention and control method, system and terminal equipment
CN114694005A (en) Target detection model training method and device, and target detection method and device
CN112991349B (en) Image processing method, device, equipment and storage medium
CN114419679B (en) Data analysis method, device and system based on wearable device data
CN112766397B (en) Classification network and implementation method and device thereof
CN109542785B (en) Invalid bug determination method and device
CN113743618A (en) Time series data processing method and device, readable medium and electronic equipment
CN109740336B (en) Method and device for identifying verification information in picture and electronic equipment
CN109145821B (en) Method and device for positioning pupil image in human eye image
CN109598478B (en) Wind measurement result description document generation method and device and electronic equipment
CN108388982B (en) Task running method and device and electronic equipment
CN114397671B (en) Course angle smoothing method and device of target and computer readable storage medium
CN111046909A (en) Load prediction method and device
CN110018844B (en) Management method and device of decision triggering scheme and electronic equipment
CN113111872B (en) Training method and device of image recognition model, electronic equipment and storage medium
US11846672B2 (en) Method and device for testing system-on-chip, electronic device using method, and computer readable storage medium
CN113344145A (en) Character recognition method, character recognition device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210525

RJ01 Rejection of invention patent application after publication