CN111967452A - Target detection method, computer equipment and readable storage medium - Google Patents

Target detection method, computer equipment and readable storage medium Download PDF

Info

Publication number
CN111967452A
CN111967452A CN202011129543.5A CN202011129543A CN111967452A CN 111967452 A CN111967452 A CN 111967452A CN 202011129543 A CN202011129543 A CN 202011129543A CN 111967452 A CN111967452 A CN 111967452A
Authority
CN
China
Prior art keywords
target
output
output layer
network model
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011129543.5A
Other languages
Chinese (zh)
Other versions
CN111967452B (en
Inventor
张�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinmai Microelectronics Co ltd
Original Assignee
Hangzhou Xiongmai Integrated Circuit Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xiongmai Integrated Circuit Technology Co Ltd filed Critical Hangzhou Xiongmai Integrated Circuit Technology Co Ltd
Priority to CN202011129543.5A priority Critical patent/CN111967452B/en
Publication of CN111967452A publication Critical patent/CN111967452A/en
Application granted granted Critical
Publication of CN111967452B publication Critical patent/CN111967452B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection method, computer equipment and a readable storage medium, and relates to the technical field of target detection. According to the technical scheme provided by the invention, in the first step, the loss of the first output characteristic diagram at the same stage is calculated by using the classification label, and the network is updated reversely; the second step is that: and filtering the first output characteristic diagram of the same stage, decoding a classification label of the second characteristic diagram as a label of the second output characteristic diagram, calculating classification loss, then reversely updating the network, continuously circulating the first step and the second step to iteratively optimize the network, and improving the detection performance of the single-stage end-to-end detection network.

Description

Target detection method, computer equipment and readable storage medium
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method, computer equipment and a readable storage medium.
Background
In the prior art, a deep learning method is generally used for road target detection. The current algorithms based on RPN network include Rcnn, Fast-Rcnn, Faster-Rcnn, Cascade-Rcnn, etc. The single-stage end-to-end algorithm with the anchor point box comprises a Yolo series, an SSD series and the like. The algorithms for the single-order end-to-end anchorless box are FCOS, CenterNet, CornerNet, etc.
However, in the prior art, the target detection accuracy of the multi-stage detection network with the regional candidate network (RPN) is high, but the time complexity is much higher, and the multi-stage detection network is not suitable for embedded end deployment. The single-stage end-to-end detection network has high speed and is suitable for the deployment of an embedded end, but the detection precision is poor. Based on the method, the single-stage end-to-end detection network is continuously improved so as to improve the detection rate of the network.
Disclosure of Invention
In order to solve the foregoing problems, the present invention provides a target detection method, which improves the detection performance of a single-stage end-to-end detection network.
In order to achieve the purpose, the invention adopts the following technical scheme:
an object detection method for detecting a road object, comprising the steps of:
acquiring road picture data and preprocessing the road picture data to be used as a sample;
establishing a network model, training the network model by using a sample, and detecting a road target by using the trained network model;
characterized in that the network model comprises a target frame output layer and at least two target output layers, each target output layer outputting a feature map,
the network model training method comprises the following steps:
firstly, updating: calculating the loss of the characteristic diagram output by the first target output layer, outputting the characteristic diagram output by the first target output layer to the next target output layer, and reversely updating the network model;
and (3) cyclic updating: the next target output layer filters the received feature map, the filtered result is used as label information of the feature map output by the target output layer, the loss of the feature map output by the target output layer is calculated according to the label information, the feature map output by the target output layer is output to the next target output layer, and the network model is updated reversely;
and repeating the cyclic updating step until the last target output layer.
Optionally, before calculating the loss of the feature map output by the target output layer, classifying the targets in the sample to obtain classification labels of the classified targets, and then calculating the loss of the feature map output by the target output layer according to the following formula:
Figure 775779DEST_PATH_IMAGE001
wherein the content of the first and second substances,E lossclass IIn order to classify the loss of the object,L i the classification label is an output value of the network model in a form of one-hot coding.
Optionally, the classifying the target in the sample includes the following steps:
calculating the IOU between the target frame output by the target frame output layer and the real target frame of the target in the sample according to the following formula:
Figure 720602DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 869823DEST_PATH_IMAGE004
respectively the upper left point and the lower right point of the real target frame of the target in the sample,
Figure 609109DEST_PATH_IMAGE005
the upper left point and the lower right point of the target frame output by the target frame output layer are respectively;
determining the classification of the target in the sample according to the IOU, and marking a classification label:
Figure 843781DEST_PATH_IMAGE006
wherein the content of the first and second substances,neg_classthe negative sample is a frame of the negative sample,pos_classin the case of a positive sample frame,threshto distinguish between thresholds for positive and negative sample boxes.
Optionally, when the network model is trained by using the sample, the loss is calculated for the target frame output layer according to the following formula:
Figure 326715DEST_PATH_IMAGE008
wherein the content of the first and second substances,E lossframeThe loss of layers is output for the target box.
Optionally, the feature map output by the target output layer comprisesclassThe shaft, filtering the received characteristic map includes the steps of:
obtainingclassThe maximum value in the axis and its corresponding index value;
filtering the index value according to the following formula, wherein the filtered index value is used as a filtered result:
Figure 330443DEST_PATH_IMAGE010
wherein the content of the first and second substances,f_stage i _0_valueis as followsiIn the first characteristic diagram of a stageclassThe maximum value of the axis is that of the axis,f_stage i _0_indexis as followsiIn a stageclassThe index value corresponding to the maximum value of the axis,thresh’is the tag threshold.
Optionally, when the network model is trained, the feature graph output by the target output layer is screened according to the following formula:
Figure 240631DEST_PATH_IMAGE012
wherein the content of the first and second substances,f_stage i _outin order to obtain the characteristic diagram after screening,f_stage i _lastfor the feature map output by the last target output layer,f_stage i _(last-1) a feature map output for a target output layer preceding the last target output layer,thresh”is a screening threshold;
and traversing the screened feature map, and screening out the coordinates of the target which is greater than the confidence coefficient threshold value.
Optionally, when the network model is trained, the coordinates of the target in the sample are decoded according to the coordinates of the screened target according to the following formula:
Figure 228178DEST_PATH_IMAGE013
wherein the content of the first and second substances,xyin order to select the coordinates of the object,x1_offsety1_offsetoutputting the coordinates of the top left point of the layer predicted target box for the target box,x2_offsety2_offsetoutputting the coordinates of the bottom right point of the layer predicted target frame for the target frame,box_x1、box_y1 is the coordinate of the upper left point of the target frame output by the target frame output layer,box_x2、box_y2 is the coordinates of the lower right point of the target frame output by the target frame output layer,stridestep size of the feature map relative to the sample;
and finally, carrying out non-maximum suppression operation on the output target frame.
Optionally, the network model has several levels of target detection, where different levels of target detection are for targets of different sizes, and each level of target detection includes a target frame output layer and at least two target output layers.
Optionally, the acquiring and preprocessing the road picture data includes the following steps:
selecting road picture data in a natural scene;
normalizing the road picture data according to the following formula:
Figure 249224DEST_PATH_IMAGE014
wherein:
Figure 841879DEST_PATH_IMAGE015
in order to input the data after normalization,mthe road picture data is obtained;
and randomly scaling the normalized road picture data, and randomly cutting the road picture data after scaling, wherein the size of the cut block is 256 × 256, and if the cut block does not contain the road target, the current cut block is used as a negative sample.
Optionally, an Adam optimization method is used for training the network model, the basic learning rate is 0.001, and the training batch size is 25.
The invention has the following beneficial effects:
according to the technical scheme provided by the invention, the classification label is dynamically assigned to the single-stage target detection, and a multi-step learning method is adopted in the sub-training process, so that the network detection is more stable, the more robust characteristic is learned, and the detection rate of the detection network is improved while the characteristic that the single-stage end-to-end detection network is suitable for embedded end deployment is maintained.
Furthermore, the present invention also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing any of the above methods when executing the computer program.
Meanwhile, the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method of any one of the above.
These features and advantages of the present invention will be disclosed in more detail in the following detailed description of the invention. The preferred embodiments or means of the present invention are not intended to limit the technical aspects of the present invention. In addition, each of the features, elements and components appearing hereinafter is plural, and different symbols or numerals are given for convenience of representation, but all represent the same or similar structural or functional parts.
Detailed Description
The technical solutions of the embodiments of the present invention are explained and illustrated below, but the following embodiments are only preferred embodiments of the present invention, and not all of them. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative effort belong to the protection scope of the present invention.
Reference in the specification to "one embodiment" or "an example" means that a particular feature, structure or characteristic described in connection with the embodiment itself may be included in at least one embodiment of the patent disclosure. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
The first embodiment is as follows:
the embodiment provides a target detection method for detecting a road target, which comprises the following steps:
and acquiring road picture data and preprocessing the road picture data to be used as a sample. In this step, firstly, road picture data in a natural scene is selected; then, normalizing the road picture data according to the following formula:
Figure 922968DEST_PATH_IMAGE016
wherein:
Figure 132232DEST_PATH_IMAGE017
in order to input the data after normalization,mthe road picture data is obtained;
and randomly zooming the normalized road picture data to adapt to different target sizes, and performing random Cutout blocks after zooming, wherein the size of each Cutout block is 256 × 256, if the Cutout blocks do not contain the road target, the current Cutout blocks are used as negative samples to be trained so as to increase the negative sample learning of the network, and finally, data enhancement operations such as Gaussian blur, brightness, turnover, Cutout and the like are randomly added.
And establishing a network model which comprises a target frame output layer and at least two target output layers, wherein each target output layer outputs a characteristic diagram. Due to the fact that the size of the road target changes greatly, the network model has a plurality of levels of target detection, different levels of target detection aim at targets with different sizes, and each level of target detection comprises a target frame output layer and at least two target output layers. Specifically, in this embodiment, the network model provided in this embodiment has two levels of prediction, each level of prediction has two target output layers, the 1 st level of prediction is a road target with a height greater than 8 and less than 48, and the 2 nd level of prediction is a road target with a height greater than 48 and less than 256:
Figure 956969DEST_PATH_IMAGE018
where w is the width of the road target, h is the height of the road target, and stage1 and stage2 respectively refer to the two-stage outputs of the network.
The specific network configuration is as follows:
Figure 935289DEST_PATH_IMAGE020
Figure 915840DEST_PATH_IMAGE022
where k represents the convolution kernel size, n represents the number of output convolution signatures, s represents the convolution sliding step, Bn represents the BatchNormalization operation, Relu and Softmax represent the activation functions used. Wherein class1_0, class1_1, class2_0, class2_1 output layer employ the softmax activation function:
Figure 612401DEST_PATH_IMAGE023
wherein the content of the first and second substances,x iis referred to asiThe output of the' one neuron(s),
Figure 975249DEST_PATH_IMAGE024
it means that all output neurons are summed by exponential operation. The sum of the probability values for each neural node output by the formula is equal to 1.
Training the network model by using the sample, wherein the training of the network model comprises the following steps:
firstly, updating: and calculating the loss of the characteristic diagram output by the first target output layer. Before calculating the loss of the feature map output by the target output layer, classifying the targets in the sample according to the following steps to obtain classification labels of the classification targets:
calculating the IOU between the target frame output by the target frame output layer and the real target frame of the target in the sample according to the following formula:
Figure 339234DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 496546DEST_PATH_IMAGE004
respectively the upper left point and the lower right point of the real target frame of the target in the sample,
Figure 414823DEST_PATH_IMAGE005
the upper left point and the lower right point of the target frame output by the target frame output layer are respectively;
determining the classification of the target in the sample according to the IOU, and marking a classification label:
Figure 846942DEST_PATH_IMAGE027
wherein the content of the first and second substances,neg_classthe negative sample is a frame of the negative sample,pos_classin the case of a positive sample frame,threshto distinguish the threshold values of the positive sample box and the negative sample box, the threshold values of the positive sample box and the negative sample box are specifically distinguished in the present embodimentthreshIs 0.6.
After the classification label of the classification target is obtained, calculating the loss of the feature map output by the target output layer according to the classification label and the following formula:
Figure 534275DEST_PATH_IMAGE028
wherein the content of the first and second substances,E lossclass IIn order to classify the loss of the object,L i the classification label is an output value of the network model in a form of one-hot coding.
At the same time, the loss is calculated for the target box output layer according to the following formula:
Figure 128067DEST_PATH_IMAGE030
wherein the content of the first and second substances,E lossframeThe loss of layers is output for the target box.
And outputting the characteristic diagram output by the first target output layer to the next target output layer, and reversely updating the network model according to the calculated loss to finish the first updating step.
In particular, in this embodiment, the present embodiment has two levels of prediction, each level of prediction having two target output layers, i.e.class1_0,class1_1,class2_0,class2_1, and the characteristic graph of each target output layer output is respectively marked asf_ stage1_0、f_stage1_1、f_stage2_0、f_stage1_1, having a size of: (batch, f_num_box,class) Wherein, in the step (A),batchfor the number of samples input into the network model,f_num_boxthe number of target frames for the feature map,classfor the classification number, in the present embodiment, the classification number is 4, i.e., four categories of pedestrians, non-motor vehicles, and backgrounds. First update, i.e. calculation from class labelsf_stage1_0、f_stage2_0 loss of feature map and reverse updating of network model, and then mapping feature mapf_ stage1_0、f_stage2_0 is output toclass1_1 layers andclass2_1 layer.
And (3) cyclic updating: the next target output layer filters the received feature map. The feature map of the target output layer output comprisesclassThe shaft, filtering the received characteristic map includes the steps of:
obtainingclassThe maximum value in the axis and its corresponding index value;
filtering the index value according to the following formula, wherein the filtered index value is used as a filtered result:
Figure 533641DEST_PATH_IMAGE032
wherein the content of the first and second substances,f_stage i _0_valueis as followsiIn the first characteristic diagram of a stageclassThe maximum value of the axis is that of the axis,f_stage i _0_indexis as followsiIn a stageclassThe index value corresponding to the maximum value of the axis,thresh’is a tag threshold value
Calculating according to the label information, calculating the loss of the characteristic diagram output by the target output layer according to the step of calculating the loss of the characteristic diagram, outputting the characteristic diagram output by the target output layer to the next target output layer, and reversely updating the network model according to the calculated loss;
the real label often has noise and abnormal points, and the result is sent to the second step after the first step iterative learning, so that the label learned by the second step is more friendly and tends to a soft label learned by a network. Through multi-step learning, the network strengthens the training result of the network, and can well process abnormal values.
And repeating the cyclic updating step until the last target output layer.
In the embodiment, the characteristic diagram of the output of the target output layer is calculatedf_stage1_0、f_stageIn 2_0classThe maximum value of the axis and its corresponding index are respectively notedf_stage1_0_valvef_stage1_0_indexf_ stage2_0_valvef_stage2_0_index. The index values are filtered according to a formula that filters the index values, and when the formula is used,irespectively taking the value 1 or 2 according to the sequence number of the target output layer:
Figure 503871DEST_PATH_IMAGE034
tag thresholdthresh’Taking the index value as 0.6f_stage1_1、f_stage2_1, calculating the loss of the label information of the characteristic diagram, finally updating the network model reversely, executing the two training steps each time the network model is trained, and continuously circulating the first step, the second step and the iteration optimizationIn a network model, generally, noise and abnormal points often exist in a real label, a result is sent to a second step after iterative learning of the first step, and the label learned in the second step is more friendly and tends to a soft label learned in a network. Through multi-step learning, the network strengthens the training result of the network, and can well process abnormal values.
Then, the feature graph output by the target output layer is screened according to the following formula:
Figure 311290DEST_PATH_IMAGE036
wherein the content of the first and second substances,f_stage i _outin order to obtain the characteristic diagram after screening,f_stage i _lastfor the feature map output by the last target output layer,f_stage i _(last-1) a feature map output for a target output layer preceding the last target output layer,thresh”is a screening threshold;
traversing the screened feature map, and screening out the coordinates of the target which is greater than the confidence threshold; and decoding the corresponding coordinates of the target in the sample according to the coordinates of the screened target according to the following formula:
Figure 75984DEST_PATH_IMAGE037
wherein the content of the first and second substances,xyin order to select the coordinates of the object,x1_offsety1_offsetoutputting the coordinates of the top left point of the layer predicted target box for the target box,x2_offsety2_offsetoutputting the coordinates of the bottom right point of the layer predicted target frame for the target frame,box_x1、box_y1 is the coordinate of the upper left point of the target frame output by the target frame output layer,box_x2、box_y2 is the coordinates of the lower right point of the target frame output by the target frame output layer,stridein this embodiment, the 1 st level is the step size of the feature map relative to the sampleThe graph step size is 8, and the second level is 16 relative to the original graph step size;
and then carrying out non-maximum suppression operation on the output target frame.
In the embodiment, the characteristic diagram is output to the two-stage target output layerf_stage1_0、f_stageThe formula for screening 2_0 is specifically as follows:
Figure 703274DEST_PATH_IMAGE039
ilastand respectively taking the value 1 or 2 according to the sequence number of the target output layer.
In this embodiment, the Adam optimization method is used for training the network model, the basic learning rate is 0.001, and the training batch size is 25. And training the network model by using the samples according to the basic learning rate, iterating the target detection network model, and finally, detecting the road target by using the trained network model.
According to the technical scheme provided by the embodiment, the classification labels are dynamically assigned to the single-stage target detection, a multi-step learning method is adopted in the sub-training process, so that the network detection is more stable, the more robust characteristics are learned, and the detection rate of the detection network is improved while the characteristic that the single-stage end-to-end detection network is suitable for embedded end deployment is maintained.
Example two
The present embodiment provides a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the method of any of the above embodiments when executing the computer program. It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. Accordingly, the computer program can be stored in a non-volatile computer readable storage medium, and when executed, can implement the method of any one of the above embodiments. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct bused dynamic RAM (DRDRAM), and bused dynamic RAM (RDRAM).
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that the invention is not limited thereto, and may be embodied in other forms without departing from the spirit or essential characteristics thereof. Any modification which does not depart from the functional and structural principles of the present invention is intended to be included within the scope of the claims.

Claims (12)

1. An object detection method for detection of a road object, comprising the steps of:
acquiring road picture data and preprocessing the road picture data to be used as a sample;
establishing a network model, training the network model by using a sample, and detecting a road target by using the trained network model;
characterized in that the network model comprises a target frame output layer and at least two target output layers, each target output layer outputting a feature map,
the network model training method comprises the following steps:
firstly, updating: calculating the loss of the characteristic diagram output by the first target output layer, outputting the characteristic diagram output by the first target output layer to the next target output layer, and reversely updating the network model;
and (3) cyclic updating: the next target output layer filters the received feature map, the filtered result is used as label information of the feature map output by the target output layer, the loss of the feature map output by the target output layer is calculated according to the label information, the feature map output by the target output layer is output to the next target output layer, and the network model is updated reversely;
and repeating the cyclic updating step until the last target output layer.
2. The method of claim 1, wherein before calculating the loss of the feature map output by the target output layer, the targets in the sample are classified to obtain classification labels of the classified targets, and then the loss of the feature map output by the target output layer is calculated according to the following formula:
Figure 631455DEST_PATH_IMAGE001
wherein the content of the first and second substances,E lossclass IIn order to classify the loss of the object,L i the classification label is an output value of the network model in a form of one-hot coding.
3. The method of claim 2, wherein classifying the objects in the sample comprises the steps of:
calculating the IOU between the target frame output by the target frame output layer and the real target frame of the target in the sample according to the following formula:
Figure 561365DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 132155DEST_PATH_IMAGE003
respectively the upper left point and the lower right point of the real target frame of the target in the sample,
Figure 780305DEST_PATH_IMAGE004
the upper left point and the lower right point of the target frame output by the target frame output layer are respectively;
determining the classification of the target in the sample according to the IOU, and marking a classification label:
Figure 993112DEST_PATH_IMAGE005
wherein the content of the first and second substances,neg_classthe negative sample is a frame of the negative sample,pos_classin the case of a positive sample frame,threshto distinguish between thresholds for positive and negative sample boxes.
4. The method of claim 2, wherein when the network model is trained using the samples, the loss is calculated for the target box output layer according to the following formula:
Figure 574266DEST_PATH_IMAGE006
wherein the content of the first and second substances,E lossframeThe loss of layers is output for the target box.
5. The object detection method of claim 1, wherein the feature map output by the object output layer comprisesclassThe shaft, filtering the received characteristic map includes the steps of:
obtainingclassThe maximum value in the axis and its corresponding index value;
filtering the index value according to the following formula, wherein the filtered index value is used as a filtered result:
Figure 847115DEST_PATH_IMAGE007
wherein the content of the first and second substances,f_stage i _0_valueis as followsiIn the first characteristic diagram of a stageclassThe maximum value of the axis is that of the axis,f_stage i _0_indexis as followsiIn a stageclassThe index value corresponding to the maximum value of the axis,thresh’is the tag threshold.
6. The method of claim 5, wherein when the network model is trained, the feature map output by the target output layer is filtered according to the following formula:
Figure 248141DEST_PATH_IMAGE008
wherein the content of the first and second substances,f_stage i _outin order to obtain the characteristic diagram after screening,f_stage i _lastfor the feature map output by the last target output layer,f_stage i _(last-1) a feature map output for a target output layer preceding the last target output layer,thresh”is a screening threshold;
and traversing the screened feature map, and screening out the coordinates of the target which is greater than the confidence coefficient threshold value.
7. The method for detecting the target of claim 6, wherein when the network model is trained, the corresponding coordinates of the target in the sample are decoded according to the coordinates of the screened target according to the following formula:
Figure 264638DEST_PATH_IMAGE009
wherein the content of the first and second substances,xyin order to select the coordinates of the object,x1_offsety1_offsetoutputting the coordinates of the top left point of the layer predicted target box for the target box,x2_offsety2_offsetoutputting the coordinates of the bottom right point of the layer predicted target frame for the target frame,box_x1、box_y1 is the coordinate of the upper left point of the target frame output by the target frame output layer,box_x2、box_y2 is the target frame outputThe coordinates of the lower right point of the target frame of the layer output,stridestep size of the feature map relative to the sample;
and finally, carrying out non-maximum suppression operation on the output target frame.
8. The method according to one of claims 1 to 7, wherein the network model has several levels of object detection, different levels of object detection being directed to objects of different sizes, each level of object detection comprising an object box output layer and at least two object output layers.
9. The object detection method according to one of claims 1 to 7, wherein the step of obtaining road picture data and preprocessing comprises the steps of:
selecting road picture data in a natural scene;
normalizing the road picture data according to the following formula:
Figure 700299DEST_PATH_IMAGE010
wherein:
Figure 409629DEST_PATH_IMAGE011
in order to input the data after normalization,mthe road picture data is obtained;
and randomly scaling the normalized road picture data, and randomly cutting the road picture data after scaling, wherein the size of the cut block is 256 × 256, and if the cut block does not contain the road target, the current cut block is used as a negative sample.
10. The method of one of claims 1 to 7, wherein the network model is trained using an Adam optimization method with a base learning rate of 0.001 and a training batch size of 25.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 10 when executing the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 10.
CN202011129543.5A 2020-10-21 2020-10-21 Target detection method, computer equipment and readable storage medium Active CN111967452B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011129543.5A CN111967452B (en) 2020-10-21 2020-10-21 Target detection method, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011129543.5A CN111967452B (en) 2020-10-21 2020-10-21 Target detection method, computer equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111967452A true CN111967452A (en) 2020-11-20
CN111967452B CN111967452B (en) 2021-02-02

Family

ID=73387228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011129543.5A Active CN111967452B (en) 2020-10-21 2020-10-21 Target detection method, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111967452B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283541A (en) * 2021-06-15 2021-08-20 无锡锤头鲨智能科技有限公司 Automatic floor sorting method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256498A (en) * 2018-02-01 2018-07-06 上海海事大学 A kind of non power driven vehicle object detection method based on EdgeBoxes and FastR-CNN
US10223611B1 (en) * 2018-03-08 2019-03-05 Capital One Services, Llc Object detection using image classification models
CN111210439A (en) * 2019-12-26 2020-05-29 中国地质大学(武汉) Semantic segmentation method and device by suppressing uninteresting information and storage device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256498A (en) * 2018-02-01 2018-07-06 上海海事大学 A kind of non power driven vehicle object detection method based on EdgeBoxes and FastR-CNN
US10223611B1 (en) * 2018-03-08 2019-03-05 Capital One Services, Llc Object detection using image classification models
CN111210439A (en) * 2019-12-26 2020-05-29 中国地质大学(武汉) Semantic segmentation method and device by suppressing uninteresting information and storage device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283541A (en) * 2021-06-15 2021-08-20 无锡锤头鲨智能科技有限公司 Automatic floor sorting method

Also Published As

Publication number Publication date
CN111967452B (en) 2021-02-02

Similar Documents

Publication Publication Date Title
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN113221905B (en) Semantic segmentation unsupervised domain adaptation method, device and system based on uniform clustering and storage medium
JP5570629B2 (en) Classifier learning method and apparatus, and processing apparatus
CN111814902A (en) Target detection model training method, target identification method, device and medium
EP1968013A2 (en) Method for adapting a boosted classifier to new samples
CN112734775A (en) Image annotation, image semantic segmentation and model training method and device
CN109034201B (en) Model training and rule mining method and system
CN111931713B (en) Abnormal behavior detection method and device, electronic equipment and storage medium
CN111340023A (en) Text recognition method and device, electronic equipment and storage medium
CN111967452B (en) Target detection method, computer equipment and readable storage medium
CN112598062A (en) Image identification method and device
CN114998595A (en) Weak supervision semantic segmentation method, semantic segmentation method and readable storage medium
CN110619255B (en) Target detection method and device
CN112766351A (en) Image quality evaluation method, system, computer equipment and storage medium
CN111931572A (en) Target detection method of remote sensing image
CN111738069A (en) Face detection method and device, electronic equipment and storage medium
CN114092818B (en) Semantic segmentation method and device, electronic equipment and storage medium
CN113076823B (en) Training method of age prediction model, age prediction method and related device
CN110866484A (en) Driver face detection method, computer device and computer readable storage medium
CN110675382A (en) Aluminum electrolysis superheat degree identification method based on CNN-LapseLM
CN110796076A (en) Hyperspectral image river detection method
CN113239860B (en) Firework detection method based on video
CN114299299A (en) Tree leaf feature extraction method and device, computer equipment and storage medium
CN115205573A (en) Image processing method, device and equipment
CN113378707A (en) Object identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A target detection method, computer device, and readable storage medium

Effective date of registration: 20230308

Granted publication date: 20210202

Pledgee: Fuyang sub branch of Bank of Hangzhou Co.,Ltd.

Pledgor: Hangzhou xiongmai integrated circuit technology Co.,Ltd.

Registration number: Y2023330000470

PE01 Entry into force of the registration of the contract for pledge of patent right
CP03 Change of name, title or address

Address after: 311422 4th floor, building 9, Yinhu innovation center, 9 Fuxian Road, Yinhu street, Fuyang District, Hangzhou City, Zhejiang Province

Patentee after: Zhejiang Xinmai Microelectronics Co.,Ltd.

Address before: 311400 4th floor, building 9, Yinhu innovation center, No.9 Fuxian Road, Yinhu street, Fuyang District, Hangzhou City, Zhejiang Province

Patentee before: Hangzhou xiongmai integrated circuit technology Co.,Ltd.

CP03 Change of name, title or address