CN112949692A - Target detection method and device - Google Patents

Target detection method and device Download PDF

Info

Publication number
CN112949692A
CN112949692A CN202110148538.7A CN202110148538A CN112949692A CN 112949692 A CN112949692 A CN 112949692A CN 202110148538 A CN202110148538 A CN 202110148538A CN 112949692 A CN112949692 A CN 112949692A
Authority
CN
China
Prior art keywords
target detection
detection
yolo
sampling
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110148538.7A
Other languages
Chinese (zh)
Inventor
张一凡
刘杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Goertek Inc
Original Assignee
Goertek Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Goertek Inc filed Critical Goertek Inc
Priority to CN202110148538.7A priority Critical patent/CN112949692A/en
Publication of CN112949692A publication Critical patent/CN112949692A/en
Priority to PCT/CN2021/130102 priority patent/WO2022166293A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a target detection method and a target detection device, wherein the method comprises the following steps: setting at least one adjusting mode for a down-sampling structure of a backbone network of YOLO-v4 based on the characteristics of a target to be detected; adjusting a down-sampling structure of a backbone network of YOLO-v4 by using an adjusting mode, and constructing a target detection model based on YOLO-v 4; inputting the detection image into a target detection model, extracting a down-sampling feature map of the detection image by the target detection model, and obtaining a target detection result according to the down-sampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure. The beneficial effects of this technical scheme lie in can realizing the precision promotion of target detection through adjusting the downsampling structure to industrial defect detection scene is taken as an example, formation of image such as mar, fine hair is the target that threadiness, volume are less, if utilize original downsampling structure to handle the detection image, many times downsampling can obviously reduce the detection performance, and this problem has effectively been solved to the target detection model after the improvement.

Description

Target detection method and device
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to a target detection method and apparatus.
Background
YOLO (english is called youonly Look one, and there is no chinese name in the industry for a while) is a typical single-stage target detection technology, i.e., information such as the position and the category of a target is directly regressed according to an original image, and the fourth version, YOLO-v4, has been developed at present. Fig. 1 shows a schematic diagram of a network structure of YOLO-v4, which can be seen to include a down-sampling structure composed of a plurality of down-sampling layers, but this arrangement has some disadvantages, for example, in a defect detection scenario in industry, some defects are still difficult to be identified accurately, and the technology still has room for improvement.
It should be noted that the statements herein merely provide background information related to the present application and may not necessarily constitute prior art.
Disclosure of Invention
The embodiment of the application provides a target detection method and a target detection device, so that the target detection precision is further improved.
The embodiment of the application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a target detection method, including: setting at least one adjusting mode for a down-sampling structure of a backbone network of YOLO-v4 based on the characteristics of a target to be detected; adjusting a down-sampling structure of a backbone network of YOLO-v4 by using an adjusting mode, and constructing a target detection model based on YOLO-v 4; inputting the detection image into a target detection model, extracting a down-sampling feature map of the detection image by the target detection model, and obtaining a target detection result according to the down-sampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure.
In some embodiments, in the target detection method, adjusting the downsampling structure of the YOLO-v4 backbone network by using an adjustment method includes: the step size of at least one down-sampling layer in the down-sampling structure is adjusted.
In some embodiments, in the target detection method, adjusting the downsampling structure of the YOLO-v4 backbone network by using an adjustment method includes: one or more downsampling layers in the downsampling structure are deleted.
In some embodiments, in the target detection method, adjusting the downsampling structure of the YOLO-v4 backbone network by using an adjustment method includes: any of the 1/4 downsampling layers and 1/32 downsampling layers were deleted.
In some embodiments, in the target detection method, adjusting the downsampling structure of the YOLO-v4 backbone network by using an adjustment method further includes: and reducing the number of channels of each network structure originally connected behind the deleted down-sampling layer by half.
In some embodiments, in the target detection method, constructing a YOLO-v 4-based target detection model includes: and adding a detection branch on the basis of the specified down-sampling layer in the adjusted down-sampling structure.
In some embodiments, in the target detection method, constructing a target detection model based on YOLO-v4 further includes: and setting an anchor frame used by each detection branch in the target detection model according to the added detection branches.
In some embodiments, in the target detection method, setting an anchor frame used by each detection branch in the target detection model according to the added detection branches includes: allocating a first preset number of anchor frame groups to each detection branch, wherein the first preset number is the number of the anchor frame groups used by the original YOLO-v4 main network, and the number of the anchor frame groups allocated to each detection branch is not 0; or increasing the number of the anchor frame groups from a first preset number to a second preset number, and averagely distributing the anchor frame groups with the second preset number to each detection branch.
In some embodiments, the target detection method further comprises: and pruning the target detection model.
In a second aspect, an embodiment of the present application further provides an object detection apparatus, including: the adjusting unit is used for setting at least one adjusting mode for the down-sampling structure of the backbone network of the YOLO-v4 based on the characteristics of the target to be detected; the system comprises a construction unit, a target detection unit and a target detection unit, wherein the construction unit is used for adjusting a down-sampling structure of a backbone network of YOLO-v4 by using an adjustment mode and constructing a target detection model based on YOLO-v 4; the detection unit is used for inputting the detection image into the target detection model, extracting a down-sampling feature map of the detection image by the target detection model, and obtaining a target detection result according to the down-sampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure.
In some embodiments, in the object detection apparatus, the construction unit is configured to adjust a step size of at least one down-sampling layer in the down-sampling structure.
In some embodiments, in the object detection apparatus, the construction unit is configured to delete one or more downsampled layers in the downsampled structure.
In some embodiments, in the object detection apparatus, a unit is constructed to delete 1/4 any one of the down-sampling layer and 1/32 down-sampling layer.
In some embodiments, in the target detection apparatus, the construction unit is configured to reduce the number of channels of each network structure originally connected after the deleted downsampling layer by half.
In some embodiments, in the object detection apparatus, the construction unit is configured to add the detection branch based on a specified downsampling layer in the adjusted downsampling structure.
In some embodiments, in the object detection apparatus, the construction unit is configured to set an anchor frame used by each detection branch in the object detection model according to the added detection branches.
In some embodiments, in the target detection apparatus, the construction unit is configured to assign a first preset number of anchor frame groups to each detection branch, where the first preset number is the number of anchor frame groups used by the original YOLO-v4 backbone network, and the number of anchor frame groups to which each detection branch is assigned is not 0; or increasing the number of the anchor frame groups from a first preset number to a second preset number, and averagely distributing the anchor frame groups with the second preset number to each detection branch.
In some embodiments, the object detection apparatus further comprises: and the pruning unit is used for carrying out pruning processing on the target detection model.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method of object detection as described in any one of the above.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing one or more programs, which when executed by an electronic device including a plurality of application programs, cause the electronic device to perform the object detection method as described in any one of the above.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: and selecting YOLO-v4 to construct a target detection model, and setting an adjustment mode aiming at a down-sampling structure based on the characteristics of the target to be detected, so that the down-sampling characteristic diagram with the adjusted size can be obtained by the target detection model obtained after adjustment, and higher precision can be obtained by carrying out target detection on the basis. Taking an industrial defect detection scene as an example, images such as scratches, broken fibers and the like are linear targets with small volumes, if an original down-sampling structure is used for processing a detection image, the detection performance can be obviously reduced by down-sampling for many times, and the problem is effectively solved by the improved target detection model.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 shows a schematic diagram of the network structure of YOLO-v 4;
FIG. 2 is a graph of the characteristic dimensions of the output of each down-sampling layer shown on the basis of the network architecture of FIG. 1;
FIG. 3 shows a schematic flow diagram of a target detection method according to an embodiment of the present application;
FIG. 4 is a graph of feature sizes of down-sampled layer outputs based on a network structure of a target detection model according to one embodiment of the present application;
FIG. 5 is a graph of feature sizes of down-sampled layer outputs based on a network structure of a target detection model according to another embodiment of the present application;
FIG. 6 is a graph of feature sizes of down-sampled layer outputs based on a network structure of an object detection model according to yet another embodiment of the present application;
FIG. 7 is a graph of feature sizes of down-sampled layer outputs based on a network structure of a target detection model according to yet another embodiment of the present application;
FIG. 8 illustrates a network diagram of an object detection model according to one embodiment of the present application;
FIG. 9 shows a network diagram of an object detection model according to another embodiment of the present application;
FIG. 10 shows a schematic structural diagram of an object detection model according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 2 shows the characteristic diagram size of each down-sampling layer output on the basis of the network structure shown in fig. 1. As shown in fig. 2, in the case where the size of the input image is 416 × 416 (pixels, the same below), and the input image is divided into three channels of RGB (i.e., 416 × 3 shown in fig. 2, and the numbers marked in the following layers have the same meaning and are not explained one by one), the input image is processed into a feature map of 416 × 32 (the corresponding network structure is not shown in fig. 1), and then sequentially passes through 1/2 down-sampling layers to obtain a feature map of 208 × 208, 1/4 down-sampling layers to obtain a feature map of 104 × 104, 1/8 down-sampling layers to obtain a feature map of 52 × 52, 1/16 down-sampling layers to obtain a feature map of 26 × 26, and 1/32 down-sampling layers to obtain a feature map of 13.
The reason why the original YOLO-v4 is designed in this way is that a feature map with a smaller size can be obtained through multiple times of downsampling, and the inference speed of the model can be greatly improved by carrying out target detection on the smaller feature map.
However, the inventor finds that for a common natural object which is imaged in a planar shape and has a large volume, the detection accuracy is not reduced by multiple times of down-sampling; however, for an object which is imaged in a linear shape and has a small volume (such as some fine fiber defects and fine impurity defects in industrial detection in particular), multiple times of down-sampling can significantly reduce the detection performance of the model.
That is, the idea of the prior art is to down-sample as much as possible, and the design idea of the present invention is to reduce down-sampling, thereby achieving an improvement in accuracy.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
FIG. 3 shows a schematic flow diagram of a target detection method according to an embodiment of the present application. As shown in fig. 3, the method includes:
step S310, setting at least one adjusting mode for the down-sampling structure of the backbone network of YOLO-v4 based on the characteristics of the target to be detected.
The target to be detected can be various objects to be detected, such as vehicles, defects, and the like, and the "features" do not refer to tensor features obtained by using a neural network, but refer to slender, small-sized, and other expression features. For the purpose of distinction, the tensor features obtained by the neural network are hereinafter referred to as "feature maps".
As described above, for the target to be detected that is imaged in a linear shape and has a small volume, since the down-sampling frequency is too many, the detection accuracy is reduced, and therefore, the adjustment manner here may be to reduce the down-sampling frequency or effect achieved by the down-sampling structure.
And S320, adjusting the downsampling structure of the backbone network of the YOLO-v4 by using an adjusting mode, and constructing a target detection model based on the YOLO-v 4.
Step S330, inputting the detection image into a target detection model, extracting a down-sampling feature map of the detection image by the target detection model, and obtaining a target detection result according to the down-sampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure.
Here, the specific detection method of YOLO-v4 is not changed, and referring to fig. 1 and fig. 2, a target detection result is obtained based on the obtained downsampling feature map by using an anchor frame group through operations such as upsampling, splicing, convolution and the like by using a detection branch extracted based on a downsampling layer. In order to improve the detection effect, it can also be considered to add a detection branch, and a corresponding embodiment will be described later.
It can be seen that the method shown in fig. 3 can improve the precision of target detection by adjusting the down-sampling structure, and taking an industrial defect detection scene as an example, images such as scratches and broken fibers are linear targets with a small volume, and if an original down-sampling structure is used to process a detection image, the detection performance is obviously reduced by down-sampling for many times, and the problem is effectively solved by the improved target detection model.
In some embodiments, in the target detection method, adjusting the downsampling structure of the YOLO-v4 backbone network by using an adjustment method includes: the step size of at least one down-sampling layer in the down-sampling structure is adjusted.
For example, in the YOLO-v4 backbone network, the step size (stride) of 1/8 downsampling layers is 2, and as can be seen from fig. 2, taking the size of the detected image as 416 × 416, the size of the downsampling feature map obtained by 1/8 downsampling layers is 52 × 52. If the step size is adjusted from 2 to 1, the effect shown in fig. 4 can be obtained, that is, taking the size of the detected image as 416 × 416 as an example, the size of the down-sampled feature obtained by the 1/8 down-sampling layer is still 104 × 104, which is the same as the size of the down-sampled feature obtained by the 1/4 down-sampling layer (the number of channels changes).
The target detection model constructed by the original YOLO-v4 is used as a comparative example 1, the step length of a 1/8 down-sampling layer is set to be 1, the rest target detection models obtained without change are used as an embodiment 1, detection images in an experimental set are detected after training of the same sample set, and experimental data show that the embodiment 1 is superior to the comparative example 1 in multiple indexes, specifically, the embodiment has the advantage of about 6 percentage points in an average detection accuracy (mAP) index, the embodiment has the advantage of about 1 percentage point in a recall rate (recall) index, and the embodiment has the advantage of about 4 percentage points in a detection accuracy (precision) index.
In some embodiments, in the target detection method, adjusting the downsampling structure of the YOLO-v4 backbone network by using an adjustment method includes: one or more downsampling layers in the downsampling structure are deleted.
The down-sampling times can be reduced very directly by deleting the down-sampling layer. However, reducing the number of downsamplings may also result in a relatively large size of the downsampled feature map, which may in turn increase the training and reasoning time of the target detection model.
The inventor finds a more balanced scheme through experiments. In some embodiments, in the target detection method, adjusting the downsampling structure of the YOLO-v4 backbone network by using an adjustment method includes: any of the 1/4 downsampling layers and 1/32 downsampling layers were deleted. FIG. 5 shows the feature map size of each down-sampled layer output after the down-sampled layer has been removed 1/4; fig. 6 shows the feature map size of each down-sampled layer output after the down-sampled layer is removed 1/32.
The target detection model corresponding to fig. 5 has 255 mbytes, and the target detection model corresponding to fig. 6 has only 73.7 mbytes, so that compared with the target detection model (256 mbytes) of the above comparative example 1, the volume is also reduced, and the memory space for deploying the target detection model equipment can be saved.
In some embodiments, in the target detection method, adjusting the downsampling structure of the YOLO-v4 backbone network by using an adjustment method further includes: and reducing the number of channels of each network structure originally connected behind the deleted down-sampling layer by half. For example, fig. 7 shows the feature map size output by each down-sampling layer after the down-sampling layer is removed 1/4 and the number of channels of each network structure is halved thereafter.
The target detection model corresponding to fig. 7 is only 64.2 mbytes, and compared with the target detection model (256 mbytes) of the above comparative example 1, the volume is also reduced, and the memory space for deploying the target detection model equipment can be saved.
In some embodiments, in the target detection method, constructing a YOLO-v 4-based target detection model includes: and adding a detection branch on the basis of the specified down-sampling layer in the adjusted down-sampling structure.
Referring to fig. 1, the original YOLO-v4 backbone network has three detection branches, which are respectively connected to the 1/8 down-sampling layer, the 1/16 down-sampling layer and the 1/32 down-sampling layer.
The target detection can be realized on more downsampling feature maps by adding the detection branches, so that the detection precision can be improved. For example, fig. 8 shows a network schematic diagram of a target detection model according to an embodiment of the present application, and compared with the original YOLO-v4 backbone network shown in fig. 1, fig. 8 introduces a new detection branch at the 1/4 downsampling layer, that is, the target detection model has four detection branches in total, and compared with the original YOLO-v4 backbone network, the target detection model can perform target detection by using a downsampling feature map obtained at the 1/4 downsampling layer, so as to achieve precision improvement.
It should be noted that the scheme of increasing the detection branches and the scheme of reducing the down-sampling may be used in combination, for example, the down-sampling structure is adjusted first, and then the arrangement manner of the detection branches is adjusted based on the adjusted down-sampling structure. As in the network structure shown in fig. 7, although the 1/4 down-sampling layer is deleted, an additional detection branch can be led out from the 1/2 down-sampling layer; as with the network architecture shown in fig. 5, although the 1/32 downsampling layer is omitted, detection branches may be drawn based on the 1/2 downsampling layer, the 1/4 downsampling layer, the 1/8 downsampling layer, and the 1/16 downsampling layer, among others.
In some embodiments, in the target detection method, constructing a target detection model based on YOLO-v4 further includes: and setting an anchor frame used by each detection branch in the target detection model according to the added detection branches.
The anchor frame is a reference frame selected during target detection, the specific use mode can be realized by referring to the prior art, and the number of the anchor frames can be only adjusted according to the scheme provided by the embodiment of the application.
Referring to fig. 1, in the original YOLO-v4 backbone network, three anchor frame (anchor) groups with serial numbers 0, 1, and 2 are used for the detection branches led out from the 1/8 downsampling layer; three anchor frame groups with serial numbers of 3, 4 and 5 are used for detection branches led out from the 1/16 down-sampling layer; the detection branches from the 1/32 downsampling layer use three anchor block sets numbered 6, 7, and 8.
Since the detection branch is newly added, it is necessary to determine which anchor boxes are used by the newly added detection branch.
In some embodiments, in the target detection method, setting an anchor frame used by each detection branch in the target detection model according to the added detection branches includes: allocating a first preset number of anchor frame groups to each detection branch, wherein the first preset number is the number of the anchor frame groups used by the original YOLO-v4 main network, and the number of the anchor frame groups allocated to each detection branch is not 0; or increasing the number of the anchor frame groups from a first preset number to a second preset number, and averagely distributing the anchor frame groups with the second preset number to each detection branch.
Two possible anchor frame allocation schemes are shown, one of which may be selected for use depending on the actual requirements. In one scheme, 9 anchor frame groups used in the backbone network of the original YOLO-v4 may be re-allocated to all current detection branches, for example, referring to fig. 8, the newly added detection branch (derived from 1/4 downsampling layer) uses an anchor frame group with sequence number 0; three anchor frame groups with serial numbers of 1, 2 and 3 are used for detection branches led out from the 1/8 down-sampling layer; three anchor frame groups with serial numbers of 4, 5 and 6 are used for detection branches led out from the 1/16 down-sampling layer; the detection branch from the 1/32 down-sampling layer uses two sets of anchor boxes numbered 7 and 8.
Alternatively, in another scheme, each detection branch may use the same number of anchor frame groups, for example, referring to fig. 9, a network diagram of a target detection model according to another embodiment of the present application is shown, in which three anchor frame groups are also used for the newly added detection branch, and a total of 12 anchor frame groups are used.
In some embodiments, the target detection method further comprises: and pruning the target detection model.
Reducing down-sampling and increasing detection branches in the backbone network can improve the detection performance of linear or small-volume targets, but also bring more calculation amount. In order to reduce the amount of calculation caused by the above operations and reduce the risk of network overfitting, pruning processing may be performed on the target detection model.
For example, a network pruning algorithm may be selected, a target detection model is first sparsely trained to obtain a sparsified γ parameter (provided that the target detection model needs to use a batch standardized BN layer with the γ parameter), and then an input channel and/or an output channel of a convolutional layer is pruned based on the sparsified γ parameter.
The embodiment of the application also provides a target detection device, which is used for realizing the target detection method.
Specifically, fig. 10 shows a schematic structural diagram of an object detection model according to an embodiment of the present application. As shown in fig. 10, the object detection model 1000 includes:
the adjusting unit 1010 is configured to set at least one adjusting mode for a downsampling structure of the YOLO-v4 backbone network based on characteristics of the target to be detected.
The target to be detected can be various objects to be detected, such as vehicles, defects, and the like, and the "features" do not refer to tensor features obtained by using a neural network, but refer to slender, small-sized, and other expression features. For the purpose of distinction, the tensor features obtained by the neural network are hereinafter referred to as "feature maps".
As described above, for the target to be detected that is imaged in a linear shape and has a small volume, since the down-sampling frequency is too many, the detection accuracy is reduced, and therefore, the adjustment manner here may be to reduce the down-sampling frequency or effect achieved by the down-sampling structure.
The building unit 1020 is configured to adjust a downsampling structure of the YOLO-v4 backbone network by using an adjustment mode, and build a target detection model based on the YOLO-v 4.
The detection unit 1030 is configured to input the detection image into the target detection model, extract a downsampling feature map of the detection image by the target detection model, and obtain a target detection result according to the downsampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure.
Here, the specific detection method of YOLO-v4 is not changed, and referring to fig. 1 and fig. 2, a target detection result is obtained based on the obtained downsampling feature map by using an anchor frame group through operations such as upsampling, splicing, convolution and the like by using a detection branch extracted based on a downsampling layer. In order to improve the detection effect, it can also be considered to add a detection branch, and a corresponding embodiment will be described later.
It can be seen that the apparatus shown in fig. 10 can achieve precision improvement of target detection by adjusting the down-sampling structure, and taking an industrial defect detection scene as an example, images such as scratches and broken fibers are linear targets with a small volume, and if an original down-sampling structure is used to process a detection image, multiple down-sampling will significantly reduce detection performance, and the problem is effectively solved by the improved target detection model.
In some embodiments, in the target detection apparatus, the constructing unit 1020 is configured to adjust a step size of at least one down-sampling layer in the down-sampling structure.
In some embodiments, in the object detection apparatus, the constructing unit 1020 is configured to delete one or more downsampled layers in the downsampled structure.
In some embodiments, in the target detection apparatus, the unit 1020 is constructed to delete 1/4 any one of the down-sampling layer and 1/32 down-sampling layer.
In some embodiments, in the target detection apparatus, the constructing unit 1020 is configured to reduce the number of channels of each network structure originally connected after the deleted downsampling layer by half.
In some embodiments, in the target detection apparatus, the constructing unit 1020 is configured to add the detection branch based on a specified downsampling layer in the adjusted downsampling structure.
In some embodiments, in the target detection apparatus, the constructing unit 1020 is configured to set an anchor frame used by each detection branch in the target detection model according to the added detection branches.
In some embodiments, in the target detection apparatus, the constructing unit 1020 is configured to assign a first preset number of anchor frame groups to each detection branch, where the first preset number is the number of anchor frame groups used by the original YOLO-v4 backbone network, and the number of anchor frame groups to which each detection branch is assigned is not 0; or increasing the number of the anchor frame groups from a first preset number to a second preset number, and averagely distributing the anchor frame groups with the second preset number to each detection branch.
In some embodiments, the object detection apparatus further comprises: and the pruning unit is used for carrying out pruning processing on the target detection model.
It can be understood that the target detection apparatus can implement the steps of the target detection method executed by the target detection server provided in the foregoing embodiment, and the related explanations about the target detection method are applicable to the target detection apparatus, and are not described herein again.
Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 11, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 11, but that does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code including computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the target detection device on a logic level. The target detection means shown in fig. 11 does not constitute a limitation of the present application on the number of target detection means. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
setting at least one adjusting mode for a down-sampling structure of a backbone network of YOLO-v4 based on the characteristics of a target to be detected; adjusting a down-sampling structure of a backbone network of YOLO-v4 by using an adjusting mode, and constructing a target detection model based on YOLO-v 4; inputting the detection image into a target detection model, extracting a down-sampling feature map of the detection image by the target detection model, and obtaining a target detection result according to the down-sampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure.
The method performed by the object detection device according to the embodiment shown in fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may further execute the method executed by the target detection apparatus in fig. 1, and implement the function of the target detection apparatus in the embodiment shown in fig. 10, which is not described herein again in this embodiment of the present application.
An embodiment of the present application further provides a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which, when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the method performed by the object detection apparatus in the embodiment shown in fig. 1, and are specifically configured to perform:
setting at least one adjusting mode for a down-sampling structure of a backbone network of YOLO-v4 based on the characteristics of a target to be detected; adjusting a down-sampling structure of a backbone network of YOLO-v4 by using an adjusting mode, and constructing a target detection model based on YOLO-v 4; inputting the detection image into a target detection model, extracting a down-sampling feature map of the detection image by the target detection model, and obtaining a target detection result according to the down-sampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method of object detection, comprising:
setting at least one adjusting mode for a down-sampling structure of a backbone network of YOLO-v4 based on the characteristics of a target to be detected;
adjusting the down-sampling structure of the backbone network of the YOLO-v4 by using the adjusting mode, and constructing a target detection model based on the YOLO-v 4;
inputting a detection image into the target detection model, extracting a down-sampling feature map of the detection image by the target detection model, and obtaining a target detection result according to the down-sampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure.
2. The method of claim 1, wherein the adjusting the downsampling structure of the YOLO-v4 backbone network using the adjustment manner comprises:
and adjusting the step size of at least one down-sampling layer in the down-sampling structure.
3. The method of claim 1, wherein the adjusting the downsampling structure of the YOLO-v4 backbone network using the adjustment manner comprises:
deleting one or more downsampling layers in the downsampling structure.
4. The method of claim 3, wherein the adjusting the downsampling structure of the YOLO-v4 backbone network using the adjustment manner comprises:
any of the 1/4 downsampling layers and 1/32 downsampling layers were deleted.
5. The method of claim 3, wherein the adjusting the downsampling structure of the YOLO-v4 backbone network using the adjustment method further comprises:
and reducing the number of channels of each network structure originally connected behind the deleted down-sampling layer by half.
6. The method of claim 1, wherein constructing a YOLO-v 4-based target detection model comprises:
and adding a detection branch on the basis of the specified down-sampling layer in the adjusted down-sampling structure.
7. The method of claim 6, wherein constructing a YOLO-v 4-based target detection model further comprises:
and setting an anchor frame used by each detection branch in the target detection model according to the added detection branches.
8. The method of claim 7, wherein setting an anchor frame used by each detection branch in the target detection model according to the added detection branches comprises:
allocating a first preset number of anchor frame groups to each detection branch, wherein the first preset number is the number of the anchor frame groups used by the original YOLO-v4 main network, and the number of the anchor frame groups allocated to each detection branch is not 0;
alternatively, the first and second electrodes may be,
and increasing the number of the anchor frame groups from the first preset number to a second preset number, and averagely distributing the anchor frame groups with the second preset number to each detection branch.
9. The method of any one of claims 1 to 8, further comprising:
and pruning the target detection model.
10. An object detection device, comprising:
the adjusting unit is used for setting at least one adjusting mode for the down-sampling structure of the backbone network of the YOLO-v4 based on the characteristics of the target to be detected;
the construction unit is used for adjusting the downsampling structure of the backbone network of the YOLO-v4 by using the adjusting mode, and constructing a target detection model based on the YOLO-v 4;
the detection unit is used for inputting a detection image into the target detection model, extracting a down-sampling feature map of the detection image by the target detection model, and obtaining a target detection result according to the down-sampling feature map; the size of the downsampled feature map is determined according to the adjusted downsampled structure.
CN202110148538.7A 2021-02-03 2021-02-03 Target detection method and device Pending CN112949692A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110148538.7A CN112949692A (en) 2021-02-03 2021-02-03 Target detection method and device
PCT/CN2021/130102 WO2022166293A1 (en) 2021-02-03 2021-11-11 Target detection method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110148538.7A CN112949692A (en) 2021-02-03 2021-02-03 Target detection method and device

Publications (1)

Publication Number Publication Date
CN112949692A true CN112949692A (en) 2021-06-11

Family

ID=76242151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110148538.7A Pending CN112949692A (en) 2021-02-03 2021-02-03 Target detection method and device

Country Status (2)

Country Link
CN (1) CN112949692A (en)
WO (1) WO2022166293A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113962931A (en) * 2021-09-08 2022-01-21 宁波海棠信息技术有限公司 Foreign matter defect detection method for magnetic reed switch
WO2022166293A1 (en) * 2021-02-03 2022-08-11 歌尔股份有限公司 Target detection method and apparatus

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661614B (en) * 2022-12-09 2024-05-24 江苏稻源科技集团有限公司 Target detection method based on lightweight YOLO v1
CN116363124B (en) * 2023-05-26 2023-08-01 南京杰智易科技有限公司 Steel surface defect detection method based on deep learning

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170196509A1 (en) * 2014-06-25 2017-07-13 Canary Medical Inc. Devices, systems and methods for using and monitoring heart valves
CN107316054A (en) * 2017-05-26 2017-11-03 昆山遥矽微电子科技有限公司 Non-standard character recognition methods based on convolutional neural networks and SVMs
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110503070A (en) * 2019-08-29 2019-11-26 电子科技大学 Traffic automation monitoring method based on Aerial Images object detection process technology
CN110632608A (en) * 2018-06-21 2019-12-31 北京京东尚科信息技术有限公司 Target detection method and device based on laser point cloud
CN111488804A (en) * 2020-03-19 2020-08-04 山西大学 Labor insurance product wearing condition detection and identity identification method based on deep learning
CN111553406A (en) * 2020-04-24 2020-08-18 上海锘科智能科技有限公司 Target detection system, method and terminal based on improved YOLO-V3
CN111738987A (en) * 2020-06-01 2020-10-02 湖南品信生物工程有限公司 Automatic identification method and device for multitask cervical cancer cells
CN111899227A (en) * 2020-07-06 2020-11-06 北京交通大学 Automatic railway fastener defect acquisition and identification method based on unmanned aerial vehicle operation
CN111967480A (en) * 2020-09-07 2020-11-20 上海海事大学 Multi-scale self-attention target detection method based on weight sharing
CN112200773A (en) * 2020-09-17 2021-01-08 苏州慧维智能医疗科技有限公司 Large intestine polyp detection method based on encoder and decoder of cavity convolution
CN112232371A (en) * 2020-09-17 2021-01-15 福州大学 American license plate recognition method based on YOLOv3 and text recognition

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858496B2 (en) * 2016-01-20 2018-01-02 Microsoft Technology Licensing, Llc Object detection and classification in images
CN110633594A (en) * 2018-06-21 2019-12-31 北京京东尚科信息技术有限公司 Target detection method and device
CN111860064B (en) * 2019-04-30 2023-10-20 杭州海康威视数字技术股份有限公司 Video-based target detection method, device, equipment and storage medium
CN111462050B (en) * 2020-03-12 2022-10-11 上海理工大学 YOLOv3 improved minimum remote sensing image target detection method and device and storage medium
CN112949692A (en) * 2021-02-03 2021-06-11 歌尔股份有限公司 Target detection method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170196509A1 (en) * 2014-06-25 2017-07-13 Canary Medical Inc. Devices, systems and methods for using and monitoring heart valves
CN107316054A (en) * 2017-05-26 2017-11-03 昆山遥矽微电子科技有限公司 Non-standard character recognition methods based on convolutional neural networks and SVMs
CN110632608A (en) * 2018-06-21 2019-12-31 北京京东尚科信息技术有限公司 Target detection method and device based on laser point cloud
CN109784386A (en) * 2018-12-29 2019-05-21 天津大学 A method of it is detected with semantic segmentation helpers
CN110503070A (en) * 2019-08-29 2019-11-26 电子科技大学 Traffic automation monitoring method based on Aerial Images object detection process technology
CN111488804A (en) * 2020-03-19 2020-08-04 山西大学 Labor insurance product wearing condition detection and identity identification method based on deep learning
CN111553406A (en) * 2020-04-24 2020-08-18 上海锘科智能科技有限公司 Target detection system, method and terminal based on improved YOLO-V3
CN111738987A (en) * 2020-06-01 2020-10-02 湖南品信生物工程有限公司 Automatic identification method and device for multitask cervical cancer cells
CN111899227A (en) * 2020-07-06 2020-11-06 北京交通大学 Automatic railway fastener defect acquisition and identification method based on unmanned aerial vehicle operation
CN111967480A (en) * 2020-09-07 2020-11-20 上海海事大学 Multi-scale self-attention target detection method based on weight sharing
CN112200773A (en) * 2020-09-17 2021-01-08 苏州慧维智能医疗科技有限公司 Large intestine polyp detection method based on encoder and decoder of cavity convolution
CN112232371A (en) * 2020-09-17 2021-01-15 福州大学 American license plate recognition method based on YOLOv3 and text recognition

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
(美)保罗•加莱奥内(PAOLOGALEONE): "《TensorFlow 2.0神经网络实践》", 30 June 2020 *
刘峡壁,马霄虹,高一轩: "《人工智能-机器学习与神经网络》", 31 August 2020 *
明悦: "《多源视觉信息感知与识别》", 北京邮电大学出版社 *
李鑫,周巍,段哲民著: "《微处理器系统级片上温度感知技术》", 30 March 2019 *
费琳宇: "《基于视频的行人跟踪方法研究》", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, 15 December 2020 (2020-12-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022166293A1 (en) * 2021-02-03 2022-08-11 歌尔股份有限公司 Target detection method and apparatus
CN113962931A (en) * 2021-09-08 2022-01-21 宁波海棠信息技术有限公司 Foreign matter defect detection method for magnetic reed switch

Also Published As

Publication number Publication date
WO2022166293A1 (en) 2022-08-11

Similar Documents

Publication Publication Date Title
CN112949692A (en) Target detection method and device
CN108305158B (en) Method, device and equipment for training wind control model and wind control
KR102631381B1 (en) Convolutional neural network processing method and apparatus
CN107391527A (en) A kind of data processing method and equipment based on block chain
CN107679700A (en) Business flow processing method, apparatus and server
CN111966334B (en) Service processing method, device and equipment
CN109145003B (en) Method and device for constructing knowledge graph
CN113298050B (en) Lane line recognition model training method and device and lane line recognition method and device
WO2022166294A1 (en) Target detection method and apparatus
CN108920183B (en) Service decision method, device and equipment
CN109784207B (en) Face recognition method, device and medium
CN112766397A (en) Classification network and implementation method and device thereof
CN113763412A (en) Image processing method and device, electronic equipment and computer readable storage medium
CN111080309B (en) Data processing method, device and equipment for multiple objects or multiple models
CN112749602A (en) Target query method, device, equipment and storage medium
CN107368281B (en) Data processing method and device
CN116185545A (en) Page rendering method and device
CN113344145A (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN114359935A (en) Model training and form recognition method and device
CN112330666B (en) Image processing method, system, device and medium based on improved twin network
CN115545938B (en) Method, device, storage medium and equipment for executing risk identification service
CN115034386A (en) Service execution method, device, storage medium and electronic equipment
CN114743004A (en) Multi-scale pavement full-factor semantic segmentation method and device, electronic equipment and storage medium
CN115964449A (en) Vehicle track simplifying method and device, storage medium and electronic equipment
CN116935176A (en) Image processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210611

RJ01 Rejection of invention patent application after publication