CN111967452B - Target detection method, computer equipment and readable storage medium - Google Patents

Target detection method, computer equipment and readable storage medium Download PDF

Info

Publication number
CN111967452B
CN111967452B CN202011129543.5A CN202011129543A CN111967452B CN 111967452 B CN111967452 B CN 111967452B CN 202011129543 A CN202011129543 A CN 202011129543A CN 111967452 B CN111967452 B CN 111967452B
Authority
CN
China
Prior art keywords
target
output
output layer
network model
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011129543.5A
Other languages
Chinese (zh)
Other versions
CN111967452A (en
Inventor
张�浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Xinmai Microelectronics Co ltd
Original Assignee
Hangzhou Xiongmai Integrated Circuit Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Xiongmai Integrated Circuit Technology Co Ltd filed Critical Hangzhou Xiongmai Integrated Circuit Technology Co Ltd
Priority to CN202011129543.5A priority Critical patent/CN111967452B/en
Publication of CN111967452A publication Critical patent/CN111967452A/en
Application granted granted Critical
Publication of CN111967452B publication Critical patent/CN111967452B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a target detection method, computer equipment and a readable storage medium, and relates to the technical field of target detection. According to the technical scheme provided by the invention, in the first step, the loss of the first output characteristic diagram at the same stage is calculated by using the classification label, and the network is updated reversely; the second step is that: and filtering the first output characteristic diagram of the same stage, decoding a classification label of the second characteristic diagram as a label of the second output characteristic diagram, calculating classification loss, then reversely updating the network, continuously circulating the first step and the second step to iteratively optimize the network, and improving the detection performance of the single-stage end-to-end detection network.

Description

Target detection method, computer equipment and readable storage medium
Technical Field
The invention relates to the technical field of target detection, in particular to a target detection method, computer equipment and a readable storage medium.
Background
In the prior art, a deep learning method is generally used for road target detection. The current algorithms based on RPN network include Rcnn, Fast-Rcnn, Faster-Rcnn, Cascade-Rcnn, etc. The single-stage end-to-end algorithm with the anchor point box comprises a Yolo series, an SSD series and the like. The algorithms for the single-order end-to-end anchorless box are FCOS, CenterNet, CornerNet, etc.
However, in the prior art, the target detection accuracy of the multi-stage detection network with the regional candidate network (RPN) is high, but the time complexity is much higher, and the multi-stage detection network is not suitable for embedded end deployment. The single-stage end-to-end detection network has high speed and is suitable for the deployment of an embedded end, but the detection precision is poor. Based on the method, the single-stage end-to-end detection network is continuously improved so as to improve the detection rate of the network.
Disclosure of Invention
In order to solve the foregoing problems, the present invention provides a target detection method, which improves the detection performance of a single-stage end-to-end detection network.
In order to achieve the purpose, the invention adopts the following technical scheme:
an object detection method for detecting a road object, comprising the steps of:
acquiring road picture data and preprocessing the road picture data to be used as a sample;
establishing a network model, training the network model by using a sample, and detecting a road target by using the trained network model;
the network model comprises a target frame output layer and at least two target output layers, wherein each target output layer outputs characteristicsThe characteristic diagram of the output of the target output layer comprisesclassA shaft;
the network model training method comprises the following steps:
firstly, updating: calculating the loss of the characteristic diagram output by the first target output layer, outputting the characteristic diagram output by the first target output layer to the next target output layer, and reversely updating the network model;
and (3) cyclic updating: the next target output layer filters the received feature map, the filtering the received feature map comprising the steps of:
obtainingclassThe maximum value in the axis and its corresponding index value;
filtering the index value according to the following formula, wherein the filtered index value is used as a filtered result:
Figure 928005DEST_PATH_IMAGE001
wherein the content of the first and second substances,f_stage i _0_valueis as followsiIn the first characteristic diagram of a stageclassThe maximum value of the axis is that of the axis,f_stage i _0_indexis as followsiIn a stageclassThe index value corresponding to the maximum value of the axis,thresh’is a tag threshold;
the filtered result is used as label information of the feature graph output by the current target output layer, the loss of the feature graph output by the current target output layer is calculated according to the label information, the feature graph output by the current target output layer is output to the next target output layer, and the network model is updated reversely;
and repeating the cyclic updating step until the last target output layer.
Optionally, before calculating the loss of the feature map output by the target output layer, classifying the targets in the sample to obtain classification labels of the classified targets, and then calculating the loss of the feature map output by the target output layer according to the following formula:
Figure 251670DEST_PATH_IMAGE002
wherein the content of the first and second substances,E lossclass IIn order to classify the loss of the object,L i for the classification tags, in the form of one-hot codes,
Figure 296987DEST_PATH_IMAGE003
is the output value of the network model.
Optionally, the classifying the target in the sample includes the following steps:
calculating the IOU between the target frame output by the target frame output layer and the real target frame of the target in the sample according to the following formula:
Figure 347988DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 427940DEST_PATH_IMAGE005
respectively the upper left point and the lower right point of the real target frame of the target in the sample,
Figure 781561DEST_PATH_IMAGE006
the upper left point and the lower right point of the target frame output by the target frame output layer are respectively;
determining the classification of the target in the sample according to the IOU, and marking a classification label:
Figure 720698DEST_PATH_IMAGE007
wherein the content of the first and second substances,neg_classthe negative sample is a frame of the negative sample,pos_classin the case of a positive sample frame,threshto distinguish between thresholds for positive and negative sample boxes.
Optionally, when the network model is trained by using the sample, the loss is calculated for the target frame output layer according to the following formula:
Figure 840970DEST_PATH_IMAGE008
wherein the content of the first and second substances,E lossframeThe loss of layers is output for the target box.
Optionally, when the network model is trained, the feature graph output by the target output layer is screened according to the following formula:
Figure 41007DEST_PATH_IMAGE009
wherein the content of the first and second substances,f_stage i _outin order to obtain the characteristic diagram after screening,f_stage i _lastfor the feature map output by the last target output layer,f_stage i _(last-1) a feature map output for a target output layer preceding the last target output layer,thresh”is a screening threshold;
and traversing the screened feature map, and screening out the coordinates of the target which is greater than the confidence coefficient threshold value.
Optionally, when the network model is trained, the coordinates of the target in the sample are decoded according to the coordinates of the screened target according to the following formula:
Figure 706474DEST_PATH_IMAGE010
wherein the content of the first and second substances,xyin order to select the coordinates of the object,x1_offsety1_offsetoutputting the coordinates of the top left point of the layer predicted target box for the target box,x2_offsety2_offsetoutputting the coordinates of the bottom right point of the layer predicted target frame for the target frame,box_x1、box_y1 is the coordinate of the upper left point of the target frame output by the target frame output layer,box_x2、box_y2 is the coordinates of the lower right point of the target frame output by the target frame output layer,stridestep size of the feature map relative to the sample;
and finally, carrying out non-maximum suppression operation on the output target frame.
Optionally, the network model has several levels of target detection, where different levels of target detection are for targets of different sizes, and each level of target detection includes a target frame output layer and at least two target output layers.
Optionally, the acquiring and preprocessing the road picture data includes the following steps:
selecting road picture data in a natural scene;
normalizing the road picture data according to the following formula:
Figure 212253DEST_PATH_IMAGE011
wherein:
Figure 480424DEST_PATH_IMAGE012
in order to input the data after normalization,mthe road picture data is obtained;
and randomly scaling the normalized road picture data, and randomly cutting the road picture data after scaling, wherein the size of the cut block is 256 × 256, and if the cut block does not contain the road target, the current cut block is used as a negative sample.
Optionally, an Adam optimization method is used for training the network model, the basic learning rate is 0.001, and the training batch size is 25.
The invention has the following beneficial effects:
according to the technical scheme provided by the invention, the classification label is dynamically assigned to the single-stage target detection, and a multi-step learning method is adopted in the sub-training process, so that the network detection is more stable, the more robust characteristic is learned, and the detection rate of the detection network is improved while the characteristic that the single-stage end-to-end detection network is suitable for embedded end deployment is maintained.
Furthermore, the present invention also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing any of the above methods when executing the computer program.
Meanwhile, the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method of any one of the above.
These features and advantages of the present invention will be disclosed in more detail in the following detailed description of the invention. The preferred embodiments or means of the present invention are not intended to limit the technical aspects of the present invention. In addition, each of the features, elements and components appearing hereinafter is plural, and different symbols or numerals are given for convenience of representation, but all represent the same or similar structural or functional parts.
Detailed Description
The technical solutions of the embodiments of the present invention are explained and illustrated below, but the following embodiments are only preferred embodiments of the present invention, and not all of them. Based on the embodiments in the implementation, other embodiments obtained by those skilled in the art without any creative effort belong to the protection scope of the present invention.
Reference in the specification to "one embodiment" or "an example" means that a particular feature, structure or characteristic described in connection with the embodiment itself may be included in at least one embodiment of the patent disclosure. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
The first embodiment is as follows:
the embodiment provides a target detection method for detecting a road target, which comprises the following steps:
and acquiring road picture data and preprocessing the road picture data to be used as a sample. In this step, firstly, road picture data in a natural scene is selected; then, normalizing the road picture data according to the following formula:
Figure 144754DEST_PATH_IMAGE013
wherein:
Figure 105757DEST_PATH_IMAGE014
in order to input the data after normalization,mthe road picture data is obtained;
and randomly zooming the normalized road picture data to adapt to different target sizes, and performing random Cutout blocks after zooming, wherein the size of each Cutout block is 256 × 256, if the Cutout blocks do not contain the road target, the current Cutout blocks are used as negative samples to be trained so as to increase the negative sample learning of the network, and finally, data enhancement operations such as Gaussian blur, brightness, turnover, Cutout and the like are randomly added.
And establishing a network model which comprises a target frame output layer and at least two target output layers, wherein each target output layer outputs a characteristic diagram. Due to the fact that the size of the road target changes greatly, the network model has a plurality of levels of target detection, different levels of target detection aim at targets with different sizes, and each level of target detection comprises a target frame output layer and at least two target output layers. Specifically, in this embodiment, the network model provided in this embodiment has two levels of prediction, each level of prediction has two target output layers, the 1 st level of prediction is a road target with a height greater than 8 and less than 48, and the 2 nd level of prediction is a road target with a height greater than 48 and less than 256:
Figure 612962DEST_PATH_IMAGE015
where w is the width of the road target, h is the height of the road target, and stage1 and stage2 respectively refer to the two-stage outputs of the network.
The specific network configuration is as follows:
Figure 278298DEST_PATH_IMAGE016
Figure 452928DEST_PATH_IMAGE017
where k represents the convolution kernel size, n represents the number of output convolution signatures, s represents the convolution sliding step, Bn represents the BatchNormalization operation, Relu and Softmax represent the activation functions used. Wherein class1_0, class1_1, class2_0, class2_1 output layer employ the softmax activation function:
Figure 194619DEST_PATH_IMAGE018
wherein the content of the first and second substances,x iis referred to asiThe output of the' one neuron(s),
Figure 189120DEST_PATH_IMAGE019
it means that all output neurons are summed by exponential operation. The sum of the probability values for each neural node output by the formula is equal to 1.
Training the network model by using the sample, wherein the training of the network model comprises the following steps:
firstly, updating: and calculating the loss of the characteristic diagram output by the first target output layer. Before calculating the loss of the feature map output by the target output layer, classifying the targets in the sample according to the following steps to obtain classification labels of the classification targets:
calculating the IOU between the target frame output by the target frame output layer and the real target frame of the target in the sample according to the following formula:
Figure 799093DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure 687283DEST_PATH_IMAGE021
respectively the upper left point and the lower right point of the real target frame of the target in the sample,
Figure 990088DEST_PATH_IMAGE022
the upper left point and the lower right point of the target frame output by the target frame output layer are respectively;
determining the classification of the target in the sample according to the IOU, and marking a classification label:
Figure 347251DEST_PATH_IMAGE023
wherein the content of the first and second substances,neg_classthe negative sample is a frame of the negative sample,pos_classin the case of a positive sample frame,threshto distinguish the threshold values of the positive sample box and the negative sample box, the threshold values of the positive sample box and the negative sample box are specifically distinguished in the present embodimentthreshIs 0.6.
After the classification label of the classification target is obtained, calculating the loss of the feature map output by the target output layer according to the classification label and the following formula:
Figure 760915DEST_PATH_IMAGE024
wherein the content of the first and second substances,E lossclass IIn order to classify the loss of the object,L i for the classification tags, in the form of one-hot codes,
Figure 769191DEST_PATH_IMAGE003
is the output value of the network model.
At the same time, the loss is calculated for the target box output layer according to the following formula:
Figure 508477DEST_PATH_IMAGE025
wherein the content of the first and second substances,E lossframeThe loss of layers is output for the target box.
And outputting the characteristic diagram output by the first target output layer to the next target output layer, and reversely updating the network model according to the calculated loss to finish the first updating step.
In particular, in this embodiment, the present embodiment has two levels of prediction, each level of prediction having two target output layers, i.e.class1_0,class1_1,class2_0,class2_1, and the characteristic graph of each target output layer output is respectively marked asf_ stage1_0、f_stage1_1、f_stage2_0、f_stage1_1, having a size of: (batch, f_num_box,class) Wherein, in the step (A),batchfor the number of samples input into the network model,f_num_boxthe number of target frames for the feature map,classfor the classification number, in the present embodiment, the classification number is 4, i.e., four categories of pedestrians, non-motor vehicles, and backgrounds. First update, i.e. calculation from class labelsf_stage1_0、f_stage2_0 loss of feature map and reverse updating of network model, and then mapping feature mapf_ stage1_0、f_stage2_0 is output toclass1_1 layers andclass2_1 layer.
And (3) cyclic updating: the next target output layer filters the received feature map. The feature map of the target output layer output comprisesclassThe shaft, filtering the received characteristic map includes the steps of:
obtainingclassThe maximum value in the axis and its corresponding index value;
filtering the index value according to the following formula, wherein the filtered index value is used as a filtered result:
Figure 352937DEST_PATH_IMAGE026
wherein the content of the first and second substances,f_stage i _0_valueis as followsiIn the first characteristic diagram of a stageclassThe maximum value of the axis is that of the axis,f_stage i _0_indexis as followsiIn a stageclassThe index value corresponding to the maximum value of the axis,thresh’is a tag threshold value
Calculating according to the label information, calculating the loss of the characteristic diagram output by the target output layer according to the step of calculating the loss of the characteristic diagram, outputting the characteristic diagram output by the target output layer to the next target output layer, and reversely updating the network model according to the calculated loss;
the real label often has noise and abnormal points, and the result is sent to the second step after the first step iterative learning, so that the label learned by the second step is more friendly and tends to a soft label learned by a network. Through multi-step learning, the network strengthens the training result of the network, and can well process abnormal values.
And repeating the cyclic updating step until the last target output layer.
In the embodiment, the characteristic diagram of the output of the target output layer is calculatedf_stage1_0、f_stageIn 2_0classThe maximum value of the axis and its corresponding index are respectively notedf_stage1_0_valvef_stage1_0_indexf_ stage2_0_valvef_stage2_0_index. The index values are filtered according to a formula that filters the index values, and when the formula is used,irespectively taking the value 1 or 2 according to the sequence number of the target output layer:
Figure 39133DEST_PATH_IMAGE027
tag thresholdthresh’Taking the index value as 0.6f_stage1_1、f_stage2_1, calculating loss of label information of the characteristic diagram, finally updating the network model reversely, training the network model each time, executing the two training steps, continuously circulating the first step and the second step to iteratively optimize the network model, wherein usually, the real label often has noise and abnormal points, and sending the result to the second step after the first step iterative learning, so that the label learned by the second step is more friendly and tends to the soft label learned by the network. Through multi-step learning, the network strengthens the training result of the network, and can well process abnormal values.
Then, the feature graph output by the target output layer is screened according to the following formula:
Figure 167495DEST_PATH_IMAGE028
wherein the content of the first and second substances,f_stage i _outin order to obtain the characteristic diagram after screening,f_stage i _lastfeatures output for the last target output layerIn the figure, the figure shows that,f_stage i _(last-1) a feature map output for a target output layer preceding the last target output layer,thresh”is a screening threshold;
traversing the screened feature map, and screening out the coordinates of the target which is greater than the confidence threshold; and decoding the corresponding coordinates of the target in the sample according to the coordinates of the screened target according to the following formula:
Figure 546523DEST_PATH_IMAGE029
wherein the content of the first and second substances,xyin order to select the coordinates of the object,x1_offsety1_offsetoutputting the coordinates of the top left point of the layer predicted target box for the target box,x2_offsety2_offsetoutputting the coordinates of the bottom right point of the layer predicted target frame for the target frame,box_x1、box_y1 is the coordinate of the upper left point of the target frame output by the target frame output layer,box_x2、box_y2 is the coordinates of the lower right point of the target frame output by the target frame output layer,stridein this embodiment, the step size of the level 1 with respect to the original image is 8, and the step size of the level two with respect to the original image is 16;
and then carrying out non-maximum suppression operation on the output target frame.
In the embodiment, the characteristic diagram is output to the two-stage target output layerf_stage1_0、f_stageThe formula for screening 2_0 is specifically as follows:
Figure 2913DEST_PATH_IMAGE030
ilastand respectively taking the value 1 or 2 according to the sequence number of the target output layer.
In this embodiment, the Adam optimization method is used for training the network model, the basic learning rate is 0.001, and the training batch size is 25. And training the network model by using the samples according to the basic learning rate, iterating the target detection network model, and finally, detecting the road target by using the trained network model.
According to the technical scheme provided by the embodiment, the classification labels are dynamically assigned to the single-stage target detection, a multi-step learning method is adopted in the sub-training process, so that the network detection is more stable, the more robust characteristics are learned, and the detection rate of the detection network is improved while the characteristic that the single-stage end-to-end detection network is suitable for embedded end deployment is maintained.
Example two
The present embodiment provides a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the method of any of the above embodiments when executing the computer program. It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. Accordingly, the computer program can be stored in a non-volatile computer readable storage medium, and when executed, can implement the method of any one of the above embodiments. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that the invention is not limited thereto, and may be embodied in other forms without departing from the spirit or essential characteristics thereof. Any modification which does not depart from the functional and structural principles of the present invention is intended to be included within the scope of the claims.

Claims (11)

1. An object detection method for detection of a road object, comprising the steps of:
acquiring road picture data and preprocessing the road picture data to be used as a sample;
establishing a network model, training the network model by using a sample, and detecting a road target by using the trained network model;
the network model is characterized by comprising a target frame output layer and at least two target output layers, wherein each target output layer outputs a feature map, and the feature map output by each target output layer comprisesclassA shaft;
the network model training method comprises the following steps:
firstly, updating: calculating the loss of the characteristic diagram output by the first target output layer, outputting the characteristic diagram output by the first target output layer to the next target output layer, and reversely updating the network model;
and (3) cyclic updating: the next target output layer filters the received feature map, the filtering the received feature map comprising the steps of:
obtainingclassThe maximum value in the axis and its corresponding index value;
filtering the index value according to the following formula, wherein the filtered index value is used as a filtered result:
Figure 649371DEST_PATH_IMAGE001
wherein the content of the first and second substances,f_stage i _0_valueis as followsiIn the first characteristic diagram of a stageclassThe maximum value of the axis is that of the axis,f_stage i _0_indexis as followsiIn a stageclassThe index value corresponding to the maximum value of the axis,thresh’is a tag threshold;
the filtered result is used as label information of the feature graph output by the current target output layer, the loss of the feature graph output by the current target output layer is calculated according to the label information, the feature graph output by the current target output layer is output to the next target output layer, and the network model is updated reversely;
and repeating the cyclic updating step until the last target output layer.
2. The method of claim 1, wherein before calculating the loss of the feature map output by the target output layer, the targets in the sample are classified to obtain classification labels of the classified targets, and then the loss of the feature map output by the target output layer is calculated according to the following formula:
Figure 133442DEST_PATH_IMAGE002
wherein the content of the first and second substances,E lossclass IIn order to classify the loss of the object,L i for the classification tags, in the form of one-hot codes,
Figure 133759DEST_PATH_IMAGE003
is the output value of the network model.
3. The method of claim 2, wherein classifying the objects in the sample comprises the steps of:
calculating the IOU between the target frame output by the target frame output layer and the real target frame of the target in the sample according to the following formula:
Figure 588267DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 230601DEST_PATH_IMAGE005
respectively true of the target in the sampleThe upper left point and the lower right point of the real object frame,
Figure 506862DEST_PATH_IMAGE006
the upper left point and the lower right point of the target frame output by the target frame output layer are respectively;
determining the classification of the target in the sample according to the IOU, and marking a classification label:
Figure 68293DEST_PATH_IMAGE007
wherein the content of the first and second substances,neg_classthe negative sample is a frame of the negative sample,pos_classin the case of a positive sample frame,threshto distinguish between thresholds for positive and negative sample boxes.
4. The method of claim 2, wherein when the network model is trained using the samples, the loss is calculated for the target box output layer according to the following formula:
Figure 633267DEST_PATH_IMAGE008
wherein the content of the first and second substances,E lossframeThe loss of layers is output for the target box.
5. The method of claim 1, wherein when the network model is trained, the feature map output by the target output layer is filtered according to the following formula:
Figure 423499DEST_PATH_IMAGE009
wherein the content of the first and second substances,f_stage i _outin order to obtain the characteristic diagram after screening,f_stage i _lastfor the feature map output by the last target output layer,f_stage i _(last-1) a feature map output for a target output layer preceding the last target output layer,thresh”is a screening threshold;
and traversing the screened feature map, and screening out the coordinates of the target which is greater than the confidence coefficient threshold value.
6. The method for detecting the target of claim 5, wherein when the network model is trained, the corresponding coordinates of the target in the sample are decoded according to the coordinates of the screened target according to the following formula:
Figure 757529DEST_PATH_IMAGE010
wherein the content of the first and second substances,xyin order to select the coordinates of the object,x1_offsety1_offsetoutputting the coordinates of the top left point of the layer predicted target box for the target box,x2_offsety2_offsetoutputting the coordinates of the bottom right point of the layer predicted target frame for the target frame,box_x1、box_y1 is the coordinate of the upper left point of the target frame output by the target frame output layer,box_x2、box_y2 is the coordinates of the lower right point of the target frame output by the target frame output layer,stridestep size of the feature map relative to the sample;
and finally, carrying out non-maximum suppression operation on the output target frame.
7. The method according to one of claims 1 to 6, wherein the network model has several levels of object detection, different levels of object detection being for objects of different sizes, each level of object detection comprising an object box output layer and at least two object output layers.
8. The object detection method according to one of claims 1 to 6, wherein the step of obtaining road picture data and preprocessing comprises the steps of:
selecting road picture data in a natural scene;
normalizing the road picture data according to the following formula:
Figure 161965DEST_PATH_IMAGE011
wherein:
Figure 338869DEST_PATH_IMAGE012
inputting data after normalization, wherein m is road picture data;
and randomly scaling the normalized road picture data, and randomly cutting the road picture data after scaling, wherein the size of the cut block is 256 × 256, and if the cut block does not contain the road target, the current cut block is used as a negative sample.
9. The method of one of claims 1 to 6, wherein the network model is trained using an Adam optimization method with a base learning rate of 0.001 and a training batch size of 25.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 9 when executing the computer program.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 9.
CN202011129543.5A 2020-10-21 2020-10-21 Target detection method, computer equipment and readable storage medium Active CN111967452B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011129543.5A CN111967452B (en) 2020-10-21 2020-10-21 Target detection method, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011129543.5A CN111967452B (en) 2020-10-21 2020-10-21 Target detection method, computer equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN111967452A CN111967452A (en) 2020-11-20
CN111967452B true CN111967452B (en) 2021-02-02

Family

ID=73387228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011129543.5A Active CN111967452B (en) 2020-10-21 2020-10-21 Target detection method, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111967452B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283541B (en) * 2021-06-15 2022-07-22 无锡锤头鲨智能科技有限公司 Automatic floor sorting method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256498A (en) * 2018-02-01 2018-07-06 上海海事大学 A kind of non power driven vehicle object detection method based on EdgeBoxes and FastR-CNN
US10223611B1 (en) * 2018-03-08 2019-03-05 Capital One Services, Llc Object detection using image classification models
CN111210439A (en) * 2019-12-26 2020-05-29 中国地质大学(武汉) Semantic segmentation method and device by suppressing uninteresting information and storage device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256498A (en) * 2018-02-01 2018-07-06 上海海事大学 A kind of non power driven vehicle object detection method based on EdgeBoxes and FastR-CNN
US10223611B1 (en) * 2018-03-08 2019-03-05 Capital One Services, Llc Object detection using image classification models
CN111210439A (en) * 2019-12-26 2020-05-29 中国地质大学(武汉) Semantic segmentation method and device by suppressing uninteresting information and storage device

Also Published As

Publication number Publication date
CN111967452A (en) 2020-11-20

Similar Documents

Publication Publication Date Title
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN113221905B (en) Semantic segmentation unsupervised domain adaptation method, device and system based on uniform clustering and storage medium
CN111814902A (en) Target detection model training method, target identification method, device and medium
EP1968013A2 (en) Method for adapting a boosted classifier to new samples
CN107871314B (en) Sensitive image identification method and device
EP3203417A2 (en) Method for detecting texts included in an image and apparatus using the same
CN111340023B (en) Text recognition method and device, electronic equipment and storage medium
CN111931713B (en) Abnormal behavior detection method and device, electronic equipment and storage medium
CN111310850A (en) License plate detection model construction method and system and license plate detection method and system
CN112651274B (en) Road obstacle detection device, road obstacle detection method, and recording medium
CN111368758A (en) Face ambiguity detection method and device, computer equipment and storage medium
CN111967452B (en) Target detection method, computer equipment and readable storage medium
CN114998595A (en) Weak supervision semantic segmentation method, semantic segmentation method and readable storage medium
CN110163206B (en) License plate recognition method, system, storage medium and device
CN111401360B (en) Method and system for optimizing license plate detection model, license plate detection method and system
CN112766351A (en) Image quality evaluation method, system, computer equipment and storage medium
CN111738069A (en) Face detection method and device, electronic equipment and storage medium
CN114092818B (en) Semantic segmentation method and device, electronic equipment and storage medium
CN113076823B (en) Training method of age prediction model, age prediction method and related device
CN110866484A (en) Driver face detection method, computer device and computer readable storage medium
CN110751623A (en) Joint feature-based defect detection method, device, equipment and storage medium
CN114299299A (en) Tree leaf feature extraction method and device, computer equipment and storage medium
CN115205573A (en) Image processing method, device and equipment
CN113378707A (en) Object identification method and device
CN115424250A (en) License plate recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A target detection method, computer device, and readable storage medium

Effective date of registration: 20230308

Granted publication date: 20210202

Pledgee: Fuyang sub branch of Bank of Hangzhou Co.,Ltd.

Pledgor: Hangzhou xiongmai integrated circuit technology Co.,Ltd.

Registration number: Y2023330000470

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 311422 4th floor, building 9, Yinhu innovation center, 9 Fuxian Road, Yinhu street, Fuyang District, Hangzhou City, Zhejiang Province

Patentee after: Zhejiang Xinmai Microelectronics Co.,Ltd.

Address before: 311400 4th floor, building 9, Yinhu innovation center, No.9 Fuxian Road, Yinhu street, Fuyang District, Hangzhou City, Zhejiang Province

Patentee before: Hangzhou xiongmai integrated circuit technology Co.,Ltd.