WO2021036013A1 - 检测器的配置方法及装置、电子设备和存储介质 - Google Patents
检测器的配置方法及装置、电子设备和存储介质 Download PDFInfo
- Publication number
- WO2021036013A1 WO2021036013A1 PCT/CN2019/119161 CN2019119161W WO2021036013A1 WO 2021036013 A1 WO2021036013 A1 WO 2021036013A1 CN 2019119161 W CN2019119161 W CN 2019119161W WO 2021036013 A1 WO2021036013 A1 WO 2021036013A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- expansion rate
- convolution operation
- convolution
- detector
- fixed
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 72
- 238000001514 detection method Methods 0.000 claims abstract description 67
- 230000010339 dilation Effects 0.000 claims abstract description 29
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims description 96
- 238000004590 computer program Methods 0.000 claims description 17
- 238000011176 pooling Methods 0.000 claims description 11
- 238000010586 diagram Methods 0.000 description 24
- 238000012545 processing Methods 0.000 description 23
- 230000006870 function Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 239000010949 copper Substances 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
- G06T5/30—Erosion or dilatation, e.g. thinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the present disclosure relates to the field of computer vision technology, and in particular to a method and device for configuring a detector, a method and device for detecting a target, electronic equipment, and a storage medium.
- Target detection is a very important and basic technology in computer vision, which aims to detect the location and category of the target in the image.
- Target detection technology plays a vital role in a large number of fields, such as pedestrian and vehicle detection in autonomous driving, living body detection in smart homes, and pedestrian detection in security monitoring.
- target detection is also an indispensable link in order to lock a target or provide an initial frame.
- the scale of the target varies and varies in size.
- the present disclosure proposes a technical solution for target detection.
- a method for configuring a detector including:
- the convolution operation is decomposed into a first sub-convolution operation and a second sub-convolution operation.
- Two subconvolution operations and determine the upper limit expansion rate and the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation, use the upper limit expansion rate as the expansion rate of the first subconvolution operation, and set the lower limit The expansion rate is used as the expansion rate of the second subconvolution operation;
- the number of output channels of the convolution operation and the fixed expansion rate of the convolution operation are determined.
- the convolution operation by decomposing the convolution operation into a first subconvolution operation and a second subconvolution operation when the fixed expansion ratio of the convolution operation satisfies the decomposition condition, for example, in When the fixed expansion rate of the convolution operation is a decimal number, the convolution operation is decomposed into a first subconvolution operation and a second subconvolution operation with integer expansion ratios, which can be calculated in the convolution In the process, the introduction of bilinear interpolation operation is reduced, so that the calculation speed can be improved.
- the detector includes a main body network
- the convolution operation of the dilated convolution in the detector includes:
- the size of the original convolution kernel in the subject network of the detector is one or more convolution operations of a specified size.
- the detector further includes an expansion learner
- the determining the fixed expansion rate of the convolution operation of the expansion convolution in the detector includes:
- a fixed expansion rate of the convolution operation is determined.
- the fixed expansion rate of the convolution operation is determined according to the first expansion rate of the multiple training images according to the convolution operation, and the accuracy of the fixed expansion rate thus determined is high, thereby ensuring that The accuracy of the target detection by the detector.
- the expansion rate learner includes a global average pooling layer and a fully connected layer.
- the obtaining, by the expansion rate learner, the first expansion rate of the convolution operation for a plurality of training images includes:
- the first expansion rate of the convolution operation for the training image is obtained by the expansion rate learner after the parameter update.
- multiple rounds of learning are performed by the expansion rate learner, which can improve the accuracy of the first expansion rate used to determine the fixed expansion rate, and thus can improve the accuracy of the determined fixed expansion rate. It can ensure the accuracy of target detection by the detector.
- the determining the fixed expansion rate of the convolution operation according to the first expansion rate includes:
- the average value of the first expansion rate is determined as the fixed expansion rate of the convolution operation.
- that the fixed expansion ratio of the convolution operation satisfies the decomposition condition includes any one of the following:
- the fixed expansion rate of the convolution operation is a decimal number
- the minimum distance between the fixed expansion rate of the convolution operation and the integer is greater than the first threshold, wherein the minimum distance between the fixed expansion rate of the convolution operation and the integer represents the fixed expansion rate of the convolution operation and the minimum distance between the fixed expansion rate of the convolution operation and the convolution The distance between the nearest integers for the fixed expansion rate of the product operation.
- the item when the minimum distance between one of the vertical fixed expansion rate and the horizontal fixed expansion rate of the convolution operation and the integer is less than or equal to the first threshold, the item may not be decomposed, thereby reducing the detection rate.
- the calculation amount of the configuration of the device when the minimum distance between one of the vertical fixed expansion rate and the horizontal fixed expansion rate of the convolution operation and the integer is less than or equal to the first threshold, the item may not be decomposed, thereby reducing the detection rate.
- the determining the upper limit expansion rate and the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation includes:
- An integer smaller than the fixed expansion rate of the convolution operation and closest to the fixed expansion rate of the convolution operation is determined as the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation.
- the number of output channels corresponding to the first subconvolution operation and the number of output channels corresponding to the first subconvolution operation and the first subconvolution operation are determined according to the number of output channels of the convolution operation and the fixed expansion rate of the convolution operation.
- the number of output channels corresponding to the two-subconvolution operation including:
- the method further includes:
- the target training image set is used to train the detector to optimize the parameters of the detector.
- a target detection method including:
- the detector trained by the above-mentioned detector configuration method performs target detection on the to-be-detected image, and obtains a target detection result corresponding to the to-be-detected image.
- a detector configuration device including:
- the first determination module is used to determine the fixed expansion rate of the convolution operation of the expansion convolution in the detector
- the second determining module is configured to perform a convolution operation of dilated convolution on any one of the detectors, and decompose the convolution operation into the first when the fixed expansion ratio of the convolution operation satisfies the decomposition condition.
- a subconvolution operation and a second subconvolution operation and determine the upper limit expansion rate and the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation, and use the upper limit expansion rate as the first subconvolution operation Expansion rate, using the lower limit expansion rate as the expansion rate of the second subconvolution operation;
- the third determining module is configured to determine the number of output channels corresponding to the first subconvolution operation and the second subconvolution according to the number of output channels of the convolution operation and the fixed expansion rate of the convolution operation The number of output channels corresponding to the operation.
- the detector includes a main body network
- the convolution operation of the dilated convolution in the detector includes:
- the size of the original convolution kernel in the subject network of the detector is one or more convolution operations of a specified size.
- the detector further includes an expansion learner
- the first determining module includes:
- a first determining submodule configured to obtain the first expansion ratio of the convolution operation for a plurality of training images through the expansion learner
- the second determining sub-module is configured to determine the fixed expansion rate of the convolution operation according to the first expansion rate.
- the expansion rate learner includes a global average pooling layer and a fully connected layer.
- the first determining submodule is used to:
- the first expansion rate of the convolution operation for the training image is obtained by the expansion rate learner after the parameter update.
- the second determining submodule is used to:
- the average value of the first expansion rate is determined as the fixed expansion rate of the convolution operation.
- that the fixed expansion ratio of the convolution operation satisfies the decomposition condition includes any one of the following:
- the fixed expansion rate of the convolution operation is a decimal number
- the minimum distance between the fixed expansion rate of the convolution operation and the integer is greater than the first threshold, wherein the minimum distance between the fixed expansion rate of the convolution operation and the integer represents the fixed expansion rate of the convolution operation and the minimum distance between the fixed expansion rate of the convolution operation and the convolution The distance between the nearest integers for the fixed expansion rate of the product operation.
- the second determining module includes:
- the third determining sub-module is configured to determine an integer greater than the fixed expansion rate of the convolution operation and closest to the fixed expansion rate of the convolution operation as the upper limit expansion rate corresponding to the fixed expansion rate of the convolution operation ;
- the fourth determining sub-module is configured to determine an integer smaller than the fixed expansion rate of the convolution operation and closest to the fixed expansion rate of the convolution operation as the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation .
- the third determining module includes:
- a fifth determining sub-module configured to determine the overall difference coefficient corresponding to the convolution operation according to the difference between the fixed expansion rate of the convolution operation and the lower limit expansion rate;
- the sixth determining submodule is configured to determine the number of output channels corresponding to the first subconvolution operation and the second subconvolution operation according to the number of output channels of the convolution operation and the overall difference coefficient corresponding to the convolution operation. The number of output channels corresponding to the subconvolution operation.
- it also includes:
- the training module is used to train the detector by using the target training image set to optimize the parameters of the detector.
- a target detection device including:
- the acquisition module is used to acquire the image to be detected
- the target detection module is configured to perform target detection on the image to be detected by using the detector trained by the above-mentioned detector configuration device to obtain a target detection result corresponding to the image to be detected.
- an electronic device including:
- One or more processors are One or more processors;
- a memory associated with the one or more processors where the memory is used to store executable instructions that, when read and executed by the one or more processors, execute the above-mentioned detector configuration method .
- a computer-readable storage medium having computer program instructions stored thereon, and when the computer program instructions are executed by a processor, the above-mentioned configuration method of the detector is realized.
- a computer program including computer readable code, and when the computer readable code is executed in an electronic device, a processor in the electronic device executes for realizing the above method.
- the convolution operation of dilated convolution is performed on any one of the detectors, and the convolution operation is fixed in the convolution operation.
- the convolution operation When the expansion rate satisfies the decomposition condition, the convolution operation is decomposed into a first subconvolution operation and a second subconvolution operation, and the upper limit expansion rate and the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation are determined ,
- the upper limit expansion rate is used as the expansion rate of the first subconvolution operation
- the lower limit expansion rate is used as the expansion rate of the second subconvolution operation
- the number of output channels is based on the convolution operation
- the fixed expansion rate of the convolution operation the number of output channels corresponding to the first subconvolution operation and the number of output channels corresponding to the second subconvolution operation are determined, thereby by performing the expansion convolution on the detector
- the decomposition of the convolution operation of the product can reduce the introduction of relatively time-consuming bilinear interpolation operations in the process of convolution calculation, thereby improving the calculation speed and reducing the time required for target detection, so that it can be applied to real-time scenes.
- Fig. 1 shows a flowchart of a method for configuring a detector provided by an embodiment of the present disclosure.
- Fig. 2 shows a schematic diagram of an expansion rate learner in a detector configuration method provided by an embodiment of the present disclosure.
- FIG. 3 shows a schematic diagram of the number of output channels corresponding to the first subconvolution operation Conv u and the number of output channels corresponding to the second subconvolution operation Conv l in the detector configuration method provided by an embodiment of the present disclosure.
- FIG. 4 shows a schematic diagram of decomposing the convolution operation of dilated convolution in the detector into two sub-convolution operations Conv u and Conv l in the detector configuration method provided by the embodiment of the present disclosure.
- Fig. 5 shows a schematic diagram of a method for configuring a detector provided by an embodiment of the present disclosure.
- Fig. 6 shows a block diagram of a detector configuration device provided by an embodiment of the present disclosure.
- FIG. 7 shows a block diagram of an electronic device 800 provided by an embodiment of the present disclosure.
- FIG. 8 shows a block diagram of an electronic device 1900 provided by an embodiment of the present disclosure.
- embodiments of the present disclosure provide a detector configuration method and device, target detection method and device, electronic equipment, and storage medium to reduce the time required for target detection, thereby enabling Suitable for real-time scenarios.
- Fig. 1 shows a flowchart of a method for configuring a detector provided by an embodiment of the present disclosure.
- the execution subject of the detector configuration method may be a detector configuration device.
- the configuration method of the detector can be executed by a terminal device or a server or other processing device.
- the terminal device can be a user equipment (UE), a mobile device, a user terminal, a terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, or a portable device. Wearable equipment, etc.
- the configuration method of the detector may be implemented by a processor invoking computer-readable instructions stored in the memory. As shown in Fig. 1, the configuration method of the detector includes step S11 to step S13.
- the detector type of the detector and the main network of the detector can be determined first.
- the detector type of the detector can be Faster-RCNN, RFCN, RetinaNet, or SSD, etc.
- the main network of the detector can be VGG, ResNet, ResNeXt, etc.
- step S11 the fixed expansion rate of the convolution operation in which the expansion convolution is performed in the detector is determined.
- the number of convolution operations for dilation convolution in the detector may be one or more.
- the convolution operation for dilation convolution in the detector may be part or all of the convolution operation in the detector. That is, the detector may include a convolution operation that performs dilation convolution, or may include a convolution operation that does not perform dilation convolution.
- the expansion rate of the same convolution operation of the detector for different training images may be different or the same.
- the expansion rate of different convolution operations of the detector for the same training image can be different or the same.
- the expansion rate of the convolution operation may include a longitudinal expansion rate and a lateral expansion rate.
- the longitudinal expansion rate and the lateral expansion rate of the convolution operation may be different or the same.
- the fixed expansion rate may include a longitudinal fixed expansion rate and a lateral fixed expansion rate.
- the first expansion rate hereinafter may include a first longitudinal expansion rate and a first lateral expansion rate
- the second expansion rate may include a second longitudinal expansion rate and a second lateral expansion rate.
- the expansion rate of the convolution operation may not be divided into the longitudinal expansion rate and the lateral expansion rate.
- the expanded convolution kernel size expansion rate ⁇ (original convolution kernel size-1)+1.
- the longitudinal size of the expanded convolution kernel the longitudinal expansion rate ⁇ (the original convolution kernel longitudinal size-1) + 1
- the lateral size of the expanded convolution kernel lateral expansion rate ⁇ (the lateral size of the original convolution kernel-1)+1.
- the detector includes a subject network; the convolution operation of the dilated convolution in the detector includes: the size of the original convolution kernel in the subject network of the detector is a specified size One or more convolution operations.
- the designated size may include 3 ⁇ 3, or the designated size may include 5 ⁇ 5, 7 ⁇ 7, and so on.
- the convolution operation of dilated convolution in the detector includes: all convolution operations in the main network of the detector whose original convolution kernel size is a specified size.
- the main body network is ResNet
- the convolution operation for dilated convolution in the detector may include all 3 ⁇ 3 convolution operations in conv2, conv3, conv4, and conv5 of ResNet.
- the convolution operation of performing dilation convolution in the detector includes: a partial convolution operation whose original convolution kernel size in the main network of the detector is a specified size.
- the convolution operation of dilated convolution in the detector may include: one or more convolution operations in which the original convolution kernel size in the specified convolution layer of the subject network of the detector is a specified size.
- the main network is ResNet, and the designated convolutional layers can be conv3, conv4, and conv5.
- the convolution operation for dilated convolution in the detector can include all 3 ⁇ 3 convolution operations in conv3, conv4, and conv5 of ResNet. .
- the convolution operation of dilation convolution in the detector may not include the 3 ⁇ 3 convolution operation in conv2.
- the convolution operation of performing dilated convolution in the detector may include: a convolution operation in a designated convolution layer in the main network of the detector.
- the subject network is ResNet
- the convolution operation for dilation convolution in the detector may include conv2, conv3, conv4, and conv5.
- the convolution operation of performing dilation convolution in the detector may further include: a convolution operation outside the main network in the detector.
- the convolution operation of dilated convolution in the detector may also include a convolution operation in which the size of the original convolution kernel outside the main network in the detector is a specified size.
- the detector further includes an expansion learner; the determining the fixed expansion rate of the convolution operation of the expansion convolution in the detector includes: obtaining the volume through the expansion learner The product operation is directed to the first expansion rate of a plurality of training images; according to the first expansion rate, the fixed expansion rate of the convolution operation is determined.
- the fixed expansion rate of the convolution operation is determined according to the first expansion rate of the multiple training images according to the convolution operation, and the accuracy of the fixed expansion rate thus determined is high, thereby ensuring that The accuracy of the target detection by the detector.
- the expansion rate learner may be used to learn the expansion rate of the convolution operation for the training image.
- the expansion rate learner may have a one-to-one correspondence with the convolution operation of the expansion convolution in the detector. That is, an expansion rate learner can be used to learn the expansion rate of a convolution operation that performs expansion convolution.
- the expansion rate learner may be set between the convolution operation that performs the expansion convolution and the previous operation of the convolution operation that performs the expansion convolution.
- the expansion rate learner includes a global average pooling layer and a fully connected layer.
- the inflation rate learner can include a global average pooling layer and a fully connected layer.
- the first expansion rate of the convolution operation for multiple training images can be obtained through a global average pooling operation and a fully connected operation.
- the feature before the convolution operation that is, the input feature map of the convolution operation in the initial structure of the detector
- the operation and the fully connected operation predict the expansion rate of the convolution operation for the training image.
- the expansion rate learner may include a Global Average Pooling (GAP, Global Average Pooling) layer and a fully connected layer.
- GAP Global Average Pooling
- the fully connected layer may be a linear layer.
- the global average pooling layer and the fully connected layer can be connected respectively before the convolution operation, and the convolution operation can be replaced with Deformable convolution, using the predicted expansion rate to perform convolution operations.
- the obtaining the first expansion rate of the convolution operation for a plurality of training images by the expansion rate learner includes: for any training image of the plurality of training images, Obtain the second expansion rate of the convolution operation for the training image through the expansion rate learner; obtain the target detection result corresponding to the training image based on the second expansion rate; obtain the target detection result corresponding to the training image according to the training image As a result of the target detection, the parameter of the expansion rate learner is updated; the first expansion rate of the convolution operation for the training image is obtained by the expansion rate learner after the parameter update.
- the second expansion rate of the training image may be determined according to the convolution operation of each expansion convolution in the detector. Perform the expanded convolution kernel size corresponding to the expanded convolution operation, and obtain the target detection result corresponding to the training image based on the expanded detector.
- the target detection result corresponding to the training image may include the position information of the target detection frame in the training image and the probability that the training image belongs to each category. According to the target detection result corresponding to the training image and the true value of the training image, the value of the loss function of the detector can be obtained, so that the parameters of the expansion rate learner can be updated according to the value of the loss function of the detector.
- the number of times of training the expansion rate for any training image may be a preset value, for example, the preset value may be 13; or, for any training image, training may be performed until the expansion rate converges.
- the accuracy of the first expansion rate used to determine the fixed expansion rate can be improved, and thus the accuracy of the determined fixed expansion rate can be improved. Ensure the accuracy of target detection by the detector.
- the convolution operation is directed to the first expansion rate of the training image, which may refer to the expansion rate of the training image after the training of the training image is completed. That is, the convolution operation is directed to the first expansion rate of the training image, which may indicate that the convolution operation is directed to the expansion rate of the training image after the number of times the training image is trained on the expansion rate reaches a preset value, Or it may refer to the convergent expansion rate of the convolution operation with respect to the training image.
- the detector trains the expansion rate separately for different training images, so that for any convolutional layer that is dilated and convolved on the detector, multiple first expansion rates corresponding to multiple training images can be obtained .
- the determining the fixed expansion rate of the convolution operation according to the first expansion rate includes: determining the average value of the first expansion rate as the fixed expansion rate of the convolution operation Expansion rate. For example, if the fixed expansion rate of the convolution operation includes a vertical fixed expansion rate and a horizontal fixed expansion rate, the average value of the first vertical expansion rate of the convolution operation for a plurality of training images may be determined as the volume The vertical fixed expansion rate of the product operation is determined, and the average value of the first lateral expansion rate of the convolution operation for a plurality of training images is determined as the horizontal fixed expansion rate of the convolution operation. For example, the vertical fixed expansion rate is 1.7, and the horizontal fixed expansion rate is 2.9.
- the convolution operation can be determined according to the first dilation rate of the convolution operation for part of the training images (for example, 1000 training images) The fixed expansion rate.
- the fixed expansion rate of the convolution operation can be determined according to the first expansion rate of the convolution operation for 1000 training images.
- the fixed dilation rate of the convolution operation may be determined according to the first dilation rate of the convolution operation for all training images.
- step S12 perform a convolution operation of dilated convolution for any one of the detectors, and if the fixed expansion ratio of the convolution operation satisfies the decomposition condition, the convolution operation is decomposed into the first sub-convolution operation.
- Convolution operation and the second subconvolution operation and determine the upper limit expansion rate and the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation, and use the upper limit expansion rate as the expansion rate of the first subconvolution operation , And use the lower limit expansion rate as the expansion rate of the second subconvolution operation.
- the fixed expansion rate of the convolution operation is D
- the upper expansion rate corresponding to the fixed expansion rate of the convolution operation is Du
- the lower expansion rate corresponding to the fixed expansion rate of the convolution operation is Dl.
- the fixed expansion rate of the convolution operation satisfies the decomposition condition including any one of the following: the fixed expansion rate of the convolution operation is a decimal; the fixed expansion rate of the convolution operation is an integer The minimum distance of is greater than the first threshold, where the minimum distance between the fixed expansion rate of the convolution operation and an integer represents the fixed expansion rate of the convolution operation and the integer closest to the fixed expansion rate of the convolution operation The distance between.
- the fixed expansion rate of the convolution operation may be a decimal number: At least one of the longitudinal fixed expansion rate and the lateral fixed expansion rate is a decimal.
- the minimum distance between the fixed expansion rate of the convolution operation and an integer greater than the first threshold may be : The minimum distance between at least one of the vertical fixed expansion rate and the horizontal fixed expansion rate of the convolution operation and the integer is greater than the first threshold.
- the first threshold is 0.05
- the vertical fixed expansion rate of a certain convolution operation is 2.02
- the horizontal fixed expansion rate is 1.7
- the minimum distance between the vertical fixed expansion rate of the convolution operation and the integer is 0.02, which is less than the first Threshold
- the minimum distance between the horizontal fixed expansion rate of the convolution operation and the integer is 0.3, which is greater than the first threshold. Therefore, it can be determined that the convolution operation satisfies the decomposition condition.
- the minimum distance between one of the vertical fixed expansion rate and the horizontal fixed expansion rate of the convolution operation and the integer is less than or equal to the first threshold, the minimum distance between the other item and the integer is greater than the first threshold, Then it can be decomposed according to this other item.
- the vertical fixed expansion rate of the convolution operation is 2.02 and the horizontal fixed expansion rate is 1.7
- the vertical expansion rate of the first subconvolution operation is 2 and the horizontal expansion rate is 2, and the second subconvolution operation
- the longitudinal expansion rate is 2 and the lateral expansion rate is 1.
- the item when the minimum distance between one item of the vertical fixed expansion rate and the horizontal fixed expansion rate of the convolution operation and the integer is less than or equal to the first threshold, the item may not be decomposed, so that the detector can be reduced.
- the amount of calculation configured.
- the determining the upper limit expansion rate and the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation includes: combining the fixed expansion rate greater than the fixed expansion rate of the convolution operation and the same as the fixed expansion rate of the convolution operation.
- the integer closest to the fixed expansion rate of the operation is determined as the upper limit expansion rate corresponding to the fixed expansion rate of the convolution operation; it will be less than the fixed expansion rate of the convolution operation and closest to the fixed expansion rate of the convolution operation
- the integer of is determined as the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation.
- the upper longitudinal expansion rate can be determined as 2, the lower longitudinal expansion rate as 1, the upper lateral expansion rate as 3, and the lower lateral expansion rate as 2 .
- the vertical upper limit expansion rate 2 and the horizontal upper limit expansion rate 3 can be determined as the expansion rate of the first subconvolution operation, and the vertical lower limit expansion rate 1 and the lateral lower limit expansion rate 2 can be determined as the second subconvolution operation. The expansion rate.
- the convolution operation by decomposing the convolution operation into a first subconvolution operation and a second subconvolution operation when the fixed expansion ratio of the convolution operation satisfies the decomposition condition, for example, in When the fixed expansion rate of the convolution operation is a decimal number, the convolution operation is decomposed into a first subconvolution operation and a second subconvolution operation with integer expansion ratios, which can be calculated in the convolution In the process, the introduction of bilinear interpolation operation is reduced, so that the calculation speed can be improved.
- step S13 according to the number of output channels of the convolution operation and the fixed expansion rate of the convolution operation, it is determined that the number of output channels corresponding to the first subconvolution operation corresponds to the second subconvolution operation.
- the number of output channels is determined that the number of output channels corresponding to the first subconvolution operation corresponds to the second subconvolution operation.
- the number of output channels of the convolution operation is C
- the number of output channels corresponding to the first sub-convolution operation is Cu
- the number of output channels corresponding to the second sub-convolution operation is Cl.
- the number of output channels corresponding to the first subconvolution operation and the number of output channels corresponding to the first subconvolution operation and the first subconvolution operation are determined according to the number of output channels of the convolution operation and the fixed expansion rate of the convolution operation.
- the number of output channels corresponding to the two-subconvolution operation includes: determining the overall difference coefficient corresponding to the convolution operation according to the difference between the fixed expansion rate of the convolution operation and the lower limit expansion rate; The number of output channels of the product operation and the overall difference coefficient corresponding to the convolution operation are determined, and the number of output channels corresponding to the first subconvolution operation and the number of output channels corresponding to the second subconvolution operation are determined.
- the overall difference coefficient corresponding to the convolution operation may be determined according to the difference D-D1 between the fixed expansion ratio D of the convolution operation and the lower limit expansion ratio D1.
- the first of the longitudinal fixed expansion rate and the longitudinal lower limit expansion rate of the convolution operation can be determined. Difference, determining the second difference between the lateral fixed expansion rate and the lower lateral expansion rate of the convolution operation, and use the average of the first difference and the second difference as the overall difference corresponding to the convolution operation coefficient.
- the fixed expansion rate of the convolution operation includes a longitudinal fixed expansion rate of 1.7 and a lateral fixed expansion rate of 2.9
- the overall difference coefficient a 0.8 corresponding to the convolution operation.
- the number of output channels corresponding to the first sub-convolution operation Cu aC
- the number of output channels corresponding to the second sub-convolution operation Cl (1-a)C.
- FIG. 3 shows a schematic diagram of the number of output channels corresponding to the first subconvolution operation Conv u and the number of output channels corresponding to the second subconvolution operation Conv l in the detector configuration method provided by an embodiment of the present disclosure.
- the longitudinal expansion rate of the first subconvolution operation Conv u is 2 and the lateral expansion rate is 3
- the longitudinal expansion rate of the second subconvolution operation Conv 1 is 1 and the lateral expansion rate is 2.
- H ⁇ W ⁇ C in represents the height, width and number of channels of the input feature map of the convolution operation.
- the height, width and channel number of the input feature map of the first subconvolution operation Conv u and the second subconvolution operation Conv l The width and the number of channels are also H ⁇ W ⁇ C in .
- C out represents the number of output channels of the convolution operation, and the vertical fixed expansion rate of the convolution operation is 1.7 and the horizontal fixed expansion rate is 2.9.
- the number of output channels corresponding to the first subconvolution operation Conv u is 0.8
- the number of output channels corresponding to the second subconvolution operation Conv l is 0.2.
- the overall difference coefficient corresponding to the convolution operation may also be determined according to the difference between the fixed expansion rate of the convolution operation and the upper limit expansion rate.
- the time-consuming bilinear interpolation operation can be reduced during the convolution calculation process, thereby improving the calculation speed. , Reduce the time required for target detection, which can be applied to real-time scenes.
- the method further includes: adopting a target training image set Training the detector to optimize the parameters of the detector.
- the detector may no longer include an expansion rate learner, and detecting The convolution operation of dilation convolution in the device can be decomposed into two subconvolution operations.
- FIG. 4 shows a schematic diagram of decomposing the convolution operation of dilated convolution in the detector into two sub-convolution operations Conv u and Conv l in the detector configuration method provided by the embodiment of the present disclosure.
- Fig. 5 shows a schematic diagram of a method for configuring a detector provided by an embodiment of the present disclosure.
- the main network of the detector is ResNet, which decomposes the 3 ⁇ 3 convolution operations in Res2, Res3, Res4, and Res5, and decomposes each of the 3 ⁇ 3 convolutions in Res2, Res3, Res4, and Res5.
- the operation is divided into two sub-convolution operations.
- the momentum is 0.9
- the weight decay rate is set to 0.0001
- the initial learning rate is 0.00125 per training image.
- the training time can be set to 13 cycles, and the learning rate can be reduced after the 8th cycle and the 11th cycle, and the reduction rate is 10 times.
- the detector configuration method provided in the embodiments of the present disclosure can be applied to scenes that need to be hard-coded. Under the premise of ensuring that multi-scale targets can be processed, the adaptive module is removed, and the effect of reducing time consumption and improving detection speed is achieved.
- the hard coding method provided by the embodiments of the present disclosure can accelerate the compatibility with hardware compared with the adaptive method, which is beneficial to practical applications.
- the embodiment of the present disclosure also provides a target detection method, the target detection method includes: acquiring a to-be-detected image; using the detector trained by the above-mentioned detector configuration method to perform target detection on the to-be-detected image to obtain The target detection result corresponding to the image to be detected.
- the embodiments of the present disclosure use a deep learning network with an expansion ratio structure to perform target detection, which can accurately detect targets of multiple scales at the same time, and can reduce the time required for multi-scale target detection on the premise of ensuring the accuracy of target detection. It can be applied to real-time scenes of multi-scale target detection. For example, the embodiments of the present disclosure can be applied to the detection of vehicles and pedestrians of different sizes and distances in automatic driving, key frame detection in real-time intelligent video analysis, pedestrian detection in security monitoring, and living body detection in smart homes.
- the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
- the specific execution order of each step should be based on its function and possibility.
- the inner logic is determined.
- the present disclosure also provides a detector configuration device, a target detection device, an electronic device, a computer-readable storage medium, and a program.
- a detector configuration device for detecting a target of a target detection device.
- an electronic device for detecting a target of a target detection device.
- a computer-readable storage medium for storing program code.
- Fig. 6 shows a block diagram of a detector configuration device provided by an embodiment of the present disclosure.
- the configuration device of the detector includes: a first determining module 21 for determining the fixed expansion rate of the convolution operation of the dilated convolution in the detector; and a second determining module 22 for determining the Any one of the detectors performs a convolution operation of dilated convolution, and when the fixed dilation rate of the convolution operation satisfies the decomposition condition, the convolution operation is decomposed into a first sub-convolution operation and a second sub-convolution operation.
- a third determining module 23 configured to determine the first subconvolution based on the number of output channels of the convolution operation and the fixed expansion rate of the convolution operation The number of output channels corresponding to the operation and the number of output channels corresponding to the second subconvolution operation.
- the detector includes a subject network
- the convolution operation of the dilated convolution in the detector includes: the size of the original convolution kernel in the subject network of the detector is a specified size One or more convolution operations.
- the detector further includes an expansion learner;
- the first determining module 21 includes: a first determining sub-module, configured to obtain the convolution operation target through the expansion learner. A first expansion rate of each training image; a second determining sub-module for determining a fixed expansion rate of the convolution operation according to the first expansion rate.
- the expansion rate learner includes a global average pooling layer and a fully connected layer.
- the first determining submodule is configured to: for any training image among the multiple training images, obtain the convolution operation for the training image through the expansion rate learner The second expansion rate of the image; based on the second expansion rate, the target detection result corresponding to the training image is obtained; the parameter of the expansion rate learner is updated according to the target detection result corresponding to the training image; through parameter update The latter expansion rate learner obtains the first expansion rate of the convolution operation for the training image.
- the second determining submodule is configured to determine the average value of the first expansion rate as the fixed expansion rate of the convolution operation.
- the fixed expansion rate of the convolution operation satisfies the decomposition condition including any one of the following: the fixed expansion rate of the convolution operation is a decimal; the fixed expansion rate of the convolution operation is an integer The minimum distance of is greater than the first threshold, where the minimum distance between the fixed expansion rate of the convolution operation and an integer represents the fixed expansion rate of the convolution operation and the integer closest to the fixed expansion rate of the convolution operation The distance between.
- the second determining module 22 includes: a third determining sub-module, configured to determine the fixed expansion rate greater than and closest to the fixed expansion rate of the convolution operation The integer of is determined as the upper limit expansion rate corresponding to the fixed expansion rate of the convolution operation; the fourth determining sub-module is used to determine the maximum expansion rate that is less than the fixed expansion rate of the convolution operation and is the same as the fixed expansion rate of the convolution operation. The close integer is determined as the lower limit expansion rate corresponding to the fixed expansion rate of the convolution operation.
- the third determining module 23 includes: a fifth determining sub-module, configured to determine the volume according to the difference between the fixed expansion rate of the convolution operation and the lower limit expansion rate The overall difference coefficient corresponding to the product operation; a sixth determining sub-module, configured to determine the first sub-convolution operation corresponding to the first sub-convolution operation according to the number of output channels of the convolution operation and the overall difference coefficient corresponding to the convolution operation The number of output channels and the number of output channels corresponding to the second subconvolution operation.
- a training module configured to train the detector by using the target training image set to optimize the parameters of the detector.
- the embodiment of the present disclosure also provides a target detection device, including: an acquisition module for acquiring an image to be detected; a target detection module for using the detector trained by the above-mentioned detector configuration device to detect the image to be detected The image performs target detection, and the target detection result corresponding to the image to be detected is obtained.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
- the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
- the computer-readable storage medium may be a non-volatile computer-readable storage medium, or may be a volatile computer-readable storage medium.
- the embodiment of the present disclosure also proposes a computer program, including computer readable code, when the computer readable code is executed in an electronic device, the processor in the electronic device executes to implement the above method.
- An embodiment of the present disclosure also provides an electronic device, including: one or more processors; a memory associated with the one or more processors, the memory is used to store executable instructions, the executable instructions being When the one or more processors are read and executed, the foregoing method is executed.
- the electronic device can be provided as a terminal, server or other form of device.
- FIG. 7 shows a block diagram of an electronic device 800 provided by an embodiment of the present disclosure.
- the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
- the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
- the processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
- the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
- the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
- the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
- the memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method operating on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc.
- the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
- SRAM static random access memory
- EEPROM electrically erasable programmable read-only memory
- EPROM erasable and Programmable read only memory
- PROM programmable read only memory
- ROM read only memory
- magnetic memory flash memory
- flash memory magnetic disk or optical disk.
- the power supply component 806 provides power for various components of the electronic device 800.
- the power supply component 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.
- the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
- the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
- the touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
- the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
- the audio component 810 is configured to output and/or input audio signals.
- the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
- the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
- the audio component 810 further includes a speaker for outputting audio signals.
- the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
- the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
- the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
- the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
- the component is the display and the keypad of the electronic device 800.
- the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
- the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
- the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
- the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
- the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
- the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
- the electronic device 800 can access a wireless network based on a communication standard, such as Wi-Fi, 2G, 3G, 4G/LTE, 5G, or a combination thereof.
- the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
- the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
- the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
- RFID radio frequency identification
- IrDA infrared data association
- UWB ultra-wideband
- Bluetooth Bluetooth
- the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- ASIC application-specific integrated circuits
- DSP digital signal processors
- DSPD digital signal processing devices
- PLD programmable logic devices
- FPGA field-available A programmable gate array
- controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
- a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
- FIG. 8 shows a block diagram of an electronic device 1900 provided by an embodiment of the present disclosure.
- the electronic device 1900 may be provided as a server.
- the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932 for storing instructions executable by the processing component 1922, such as application programs.
- the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
- the processing component 1922 is configured to execute instructions to perform the above-described methods.
- the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) interface 1958 .
- the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows Mac OS Or similar.
- a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
- the present disclosure may be a system, method and/or computer program product.
- the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present disclosure.
- the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
- the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
- RAM random access memory
- ROM read-only memory
- EPROM erasable programmable read-only memory
- flash memory flash memory
- SRAM static random access memory
- CD-ROM compact disk read-only memory
- DVD digital versatile disk
- memory stick floppy disk
- mechanical encoding device such as a printer with instructions stored thereon
- the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
- the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
- the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
- the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
- the computer program instructions used to perform the operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more programming languages.
- Source code or object code written in any combination, the programming language includes object-oriented programming languages such as Smalltalk, C++, etc., and conventional procedural programming languages such as "C" language or similar programming languages.
- Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out.
- the remote computer can be connected to the user's computer through any kind of network-including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to connect to the user's computer) connection).
- LAN local area network
- WAN wide area network
- an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
- FPGA field programmable gate array
- PDA programmable logic array
- the computer-readable program instructions are executed to realize various aspects of the present disclosure.
- These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
- each block in the flowchart or block diagram may represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more components for realizing the specified logical function.
- Executable instructions may also occur in a different order than the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
- each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
- Apparatus For Radiation Diagnosis (AREA)
- Image Processing (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
Description
Claims (25)
- 一种检测器的配置方法,其特征在于,包括:确定检测器中进行膨胀卷积的卷积操作的固定膨胀率;对于所述检测器中任一进行膨胀卷积的卷积操作,在所述卷积操作的固定膨胀率满足分解条件的情况下,将所述卷积操作分解为第一子卷积操作和第二子卷积操作,并确定所述卷积操作的固定膨胀率对应的上限膨胀率和下限膨胀率,将所述上限膨胀率作为所述第一子卷积操作的膨胀率,将所述下限膨胀率作为所述第二子卷积操作的膨胀率;根据所述卷积操作的输出通道数以及所述卷积操作的固定膨胀率,确定所述第一子卷积操作对应的输出通道数和所述第二子卷积操作对应的输出通道数。
- 根据权利要求1所述的方法,其特征在于,所述检测器包括主体网络,所述检测器中进行膨胀卷积的卷积操作包括:所述检测器的所述主体网络中原始卷积核尺寸为指定尺寸的一个或多个卷积操作。
- 根据权利要求1或2所述的方法,其特征在于,所述检测器还包括膨胀学习器;所述确定检测器中进行膨胀卷积的卷积操作的固定膨胀率,包括:通过所述膨胀学习器获得所述卷积操作针对多个训练图像的第一膨胀率;根据所述第一膨胀率,确定所述卷积操作的固定膨胀率。
- 根据权利要求3所述的方法,其特征在于,所述膨胀率学习器包括全局平均池化层和全连接层。
- 根据权利要求3或4所述的方法,其特征在于,所述通过所述膨胀率学习器获得所述卷积操作针对多个训练图像的第一膨胀率,包括:对于所述多个训练图像中的任一训练图像,通过所述膨胀率学习器获得所述卷积操作针对所述训练图像的第二膨胀率;基于所述第二膨胀率,获得所述训练图像对应的目标检测结果;根据所述训练图像对应的目标检测结果,更新所述膨胀率学习器的参数;通过参数更新后的所述膨胀率学习器获得所述卷积操作针对所述训练图像的第一膨胀率。
- 根据权利要求3至5中任意一项所述的方法,其特征在于,所述根据所述第一膨胀率,确定所述卷积操作的固定膨胀率,包括:将所述第一膨胀率的平均值确定为所述卷积操作的固定膨胀率。
- 根据权利要求1至6中任意一项所述的方法,其特征在于,所述卷积操作的固定膨胀率满足分解条件包括以下任意一项:所述卷积操作的固定膨胀率为小数;所述卷积操作的固定膨胀率与整数的最小距离大于第一阈值,其中,所述卷积操作的固定膨胀率与整数的最小距离表示所述卷积操作的固定膨胀率和与所述卷积操作的固定膨胀率最接近的整数之间的距离。
- 根据权利要求1至7中任意一项所述的方法,其特征在于,所述确定所述卷积操作的固定膨胀率对应的上限膨胀率和下限膨胀率,包括:将大于所述卷积操作的固定膨胀率且与所述卷积操作的固定膨胀率最接近的整数确定为所述卷积操作的固定膨胀率对应的上限膨胀率;将小于所述卷积操作的固定膨胀率且与所述卷积操作的固定膨胀率最接近的整数确定为所述卷积操作的固定膨胀率对应的下限膨胀率。
- 根据权利要求1至8中任意一项所述的方法,其特征在于,所述根据所述卷积操作的输出通道数以及所述卷积操作的固定膨胀率,确定所述第一子卷积操作对应的输出通道数和所述第二子卷积操作对应的输出通道数,包括:根据所述卷积操作的固定膨胀率与所述下限膨胀率的差值,确定所述卷积操作对应的整体差值系数;根据所述卷积操作的输出通道数以及所述卷积操作对应的整体差值系数,确定所述第一子卷积操作对应的输出通道数和所述第二子卷积操作对应的输出通道数。
- 根据权利要求1至9中任意一项所述的方法,其特征在于,在所述确定所述第一子卷积操作对应的输出通道数和所述第二子卷积操作对应的输出通道数之后,还包括:采用目标训练图像集训练所述检测器,以优化所述检测器的参数。
- 一种目标检测方法,其特征在于,包括:获取待检测图像;采用权利要求10训练得到的所述检测器对所述待检测图像进行目标检测,获得所述待检测图像对应的目标检测结果。
- 一种检测器的配置装置,其特征在于,包括:第一确定模块,用于确定检测器中进行膨胀卷积的卷积操作的固定膨胀率;第二确定模块,用于对于所述检测器中任一进行膨胀卷积的卷积操作,在所述卷积操作的固定膨胀率满足分解条件的情况下,将所述卷积操作分解为第一子卷积操作和第二子卷积操作,并确定所述卷积操作的固定膨胀率对应的上限膨胀率和下限膨胀率,将所述上限膨胀率作为所述第一子卷积操作的膨胀率,将所述下限膨胀率作为所述第二子卷积操作的膨胀率;第三确定模块,用于根据所述卷积操作的输出通道数以及所述卷积操作的固定膨胀率,确定所述第一子卷积操作对应的输出通道数和所述第二子卷积操作对应的输出通道数。
- 根据权利要求12所述的装置,其特征在于,所述检测器包括主体网络,所述检测器中进行膨胀卷积的卷积操作包括:所述检测器的所述主体网络中原始卷积核尺寸为指定尺寸的一个或多个卷积操作。
- 根据权利要求12或13所述的装置,其特征在于,所述检测器还包括膨胀学习器;所述第一确定模块包括:第一确定子模块,用于通过所述膨胀学习器获得所述卷积操作针对多个训练图像的第一膨胀率;第二确定子模块,用于根据所述第一膨胀率,确定所述卷积操作的固定膨胀率。
- 根据权利要求14所述的装置,其特征在于,所述膨胀率学习器包括全局平均池化层和全连接层。
- 根据权利要求14或15所述的装置,其特征在于,所述第一确定子模块用于:对于所述多个训练图像中的任一训练图像,通过所述膨胀率学习器获得所述卷积操作针对所述训练图像的第二膨胀率;基于所述第二膨胀率,获得所述训练图像对应的目标检测结果;根据所述训练图像对应的目标检测结果,更新所述膨胀率学习器的参数;通过参数更新后的所述膨胀率学习器获得所述卷积操作针对所述训练图像的第一膨胀率。
- 根据权利要求14至16中任意一项所述的装置,其特征在于,所述第二确定子模块用于:将所述第一膨胀率的平均值确定为所述卷积操作的固定膨胀率。
- 根据权利要求12至17中任意一项所述的装置,其特征在于,所述卷积操作的固定膨胀率满足分解条件包括以下任意一项:所述卷积操作的固定膨胀率为小数;所述卷积操作的固定膨胀率与整数的最小距离大于第一阈值,其中,所述卷积操作的固定膨胀率与整数的最小距离表示所述卷积操作的固定膨胀率和与所述卷积操作的固定膨胀率最接近的整数之间的距离。
- 根据权利要求12至18中任意一项所述的装置,其特征在于,所述第二确定模块包括:第三确定子模块,用于将大于所述卷积操作的固定膨胀率且与所述卷积操作的固定膨胀率最接近的整数确定为所述卷积操作的固定膨胀率对应的上限膨胀率;第四确定子模块,用于将小于所述卷积操作的固定膨胀率且与所述卷积操作的固定膨胀率最接近的整数确定为所述卷积操作的固定膨胀率对应的下限膨胀率。
- 根据权利要求12至19中任意一项所述的装置,其特征在于,所述第三确定模块包括:第五确定子模块,用于根据所述卷积操作的固定膨胀率与所述下限膨胀率的差值,确定所述卷积操作对应的整体差值系数;第六确定子模块,用于根据所述卷积操作的输出通道数以及所述卷积操作对应的整体差值系数,确定所述第一子卷积操作对应的输出通道数和所述第二子卷积操作对应的输出通道数。
- 根据权利要求12至20中任意一项所述的装置,其特征在于,还包括:训练模块,用于采用目标训练图像集训练所述检测器,以优化所述检测器的参数。
- 一种目标检测装置,其特征在于,包括:获取模块,用于获取待检测图像;目标检测模块,用于采用权利要求21训练得到的所述检测器对所述待检测图像进行目标检测,获得所述待检测图像对应的目标检测结果。
- 一种电子设备,其特征在于,包括:一个或多个处理器;与所述一个或多个处理器关联的存储器,所述存储器用于存储可执行指令,所述可执行指令在被所述一个或多个处理器读取执行时,执行权利要求1至11中任意一项所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序指令,其特征在于,所述计算机程序指令被处理器执行时实现权利要求1至11中任意一项所述的方法。
- 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至11中的任意权利要求所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021537166A JP2022515274A (ja) | 2019-08-30 | 2019-11-18 | 検出器の配置方法、検出器の配置装置及び非一時的コンピュータ可読記憶媒体 |
SG11202106971YA SG11202106971YA (en) | 2019-08-30 | 2019-11-18 | Configuration method and apparatus for detector, electronic device, and storage medium |
KR1020217023154A KR20210113242A (ko) | 2019-08-30 | 2019-11-18 | 검출기의 배치 방법 및 장치, 전자 기기 및 기억 매체 |
US17/360,000 US20210326649A1 (en) | 2019-08-30 | 2021-06-28 | Configuration method and apparatus for detector, storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910816321.1A CN110543849B (zh) | 2019-08-30 | 2019-08-30 | 检测器的配置方法及装置、电子设备和存储介质 |
CN201910816321.1 | 2019-08-30 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/360,000 Continuation US20210326649A1 (en) | 2019-08-30 | 2021-06-28 | Configuration method and apparatus for detector, storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021036013A1 true WO2021036013A1 (zh) | 2021-03-04 |
Family
ID=68711000
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/119161 WO2021036013A1 (zh) | 2019-08-30 | 2019-11-18 | 检测器的配置方法及装置、电子设备和存储介质 |
Country Status (7)
Country | Link |
---|---|
US (1) | US20210326649A1 (zh) |
JP (1) | JP2022515274A (zh) |
KR (1) | KR20210113242A (zh) |
CN (1) | CN110543849B (zh) |
SG (1) | SG11202106971YA (zh) |
TW (1) | TWI733276B (zh) |
WO (1) | WO2021036013A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113989169A (zh) * | 2020-07-08 | 2022-01-28 | 嘉楠明芯(北京)科技有限公司 | 一种膨胀卷积加速计算方法及装置 |
CN112101374B (zh) * | 2020-08-01 | 2022-05-24 | 西南交通大学 | 基于surf特征检测和isodata聚类算法的无人机障碍物检测方法 |
CN112037157A (zh) * | 2020-09-14 | 2020-12-04 | Oppo广东移动通信有限公司 | 数据处理方法及装置、计算机可读介质及电子设备 |
CN111951269B (zh) * | 2020-10-16 | 2021-01-05 | 深圳云天励飞技术股份有限公司 | 图像处理方法及相关设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6151682A (en) * | 1997-09-08 | 2000-11-21 | Sarnoff Corporation | Digital signal processing circuitry having integrated timing information |
CN107742150A (zh) * | 2016-10-31 | 2018-02-27 | 腾讯科技(深圳)有限公司 | 一种卷积神经网络的数据处理方法和装置 |
CN108960069A (zh) * | 2018-06-05 | 2018-12-07 | 天津大学 | 一种用于单阶段物体检测器的增强上下文的方法 |
US20190147318A1 (en) * | 2017-11-14 | 2019-05-16 | Google Llc | Highly Efficient Convolutional Neural Networks |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229478B (zh) * | 2017-06-30 | 2020-12-29 | 深圳市商汤科技有限公司 | 图像语义分割及训练方法和装置、电子设备、存储介质和程序 |
SG10202108020VA (en) * | 2017-10-16 | 2021-09-29 | Illumina Inc | Deep learning-based techniques for training deep convolutional neural networks |
CN108197606A (zh) * | 2018-01-31 | 2018-06-22 | 浙江大学 | 一种基于多尺度膨胀卷积的病理切片中异常细胞的识别方法 |
CN108364061B (zh) * | 2018-02-13 | 2020-05-05 | 北京旷视科技有限公司 | 运算装置、运算执行设备及运算执行方法 |
CN108647776A (zh) * | 2018-05-08 | 2018-10-12 | 济南浪潮高新科技投资发展有限公司 | 一种卷积神经网络卷积膨胀处理电路及方法 |
CN109598269A (zh) * | 2018-11-14 | 2019-04-09 | 天津大学 | 一种基于多分辨率输入与金字塔膨胀卷积的语义分割方法 |
CN109886090B (zh) * | 2019-01-07 | 2020-12-04 | 北京大学 | 一种基于多时间尺度卷积神经网络的视频行人再识别方法 |
CN109829863B (zh) * | 2019-01-22 | 2021-06-25 | 深圳市商汤科技有限公司 | 图像处理方法及装置、电子设备和存储介质 |
CN110009095B (zh) * | 2019-03-04 | 2022-07-29 | 东南大学 | 基于深度特征压缩卷积网络的道路行驶区域高效分割方法 |
CN110009648B (zh) * | 2019-03-04 | 2023-02-24 | 东南大学 | 基于深浅特征融合卷积神经网络的路侧图像车辆分割方法 |
CN110047069B (zh) * | 2019-04-22 | 2021-06-04 | 北京青燕祥云科技有限公司 | 一种图像检测装置 |
-
2019
- 2019-08-30 CN CN201910816321.1A patent/CN110543849B/zh active Active
- 2019-11-18 JP JP2021537166A patent/JP2022515274A/ja active Pending
- 2019-11-18 KR KR1020217023154A patent/KR20210113242A/ko not_active Application Discontinuation
- 2019-11-18 SG SG11202106971YA patent/SG11202106971YA/en unknown
- 2019-11-18 WO PCT/CN2019/119161 patent/WO2021036013A1/zh active Application Filing
- 2019-12-17 TW TW108146123A patent/TWI733276B/zh active
-
2021
- 2021-06-28 US US17/360,000 patent/US20210326649A1/en not_active Abandoned
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6151682A (en) * | 1997-09-08 | 2000-11-21 | Sarnoff Corporation | Digital signal processing circuitry having integrated timing information |
CN107742150A (zh) * | 2016-10-31 | 2018-02-27 | 腾讯科技(深圳)有限公司 | 一种卷积神经网络的数据处理方法和装置 |
US20190147318A1 (en) * | 2017-11-14 | 2019-05-16 | Google Llc | Highly Efficient Convolutional Neural Networks |
CN108960069A (zh) * | 2018-06-05 | 2018-12-07 | 天津大学 | 一种用于单阶段物体检测器的增强上下文的方法 |
Also Published As
Publication number | Publication date |
---|---|
KR20210113242A (ko) | 2021-09-15 |
CN110543849A (zh) | 2019-12-06 |
TW202109365A (zh) | 2021-03-01 |
SG11202106971YA (en) | 2021-07-29 |
TWI733276B (zh) | 2021-07-11 |
CN110543849B (zh) | 2022-10-04 |
JP2022515274A (ja) | 2022-02-17 |
US20210326649A1 (en) | 2021-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI747325B (zh) | 目標對象匹配方法及目標對象匹配裝置、電子設備和電腦可讀儲存媒介 | |
WO2021036013A1 (zh) | 检测器的配置方法及装置、电子设备和存储介质 | |
US20210012523A1 (en) | Pose Estimation Method and Device and Storage Medium | |
WO2021155632A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
CN113538519B (zh) | 目标追踪方法及装置、电子设备和存储介质 | |
WO2021051650A1 (zh) | 人脸和人手关联检测方法及装置、电子设备和存储介质 | |
US11301726B2 (en) | Anchor determination method and apparatus, electronic device, and storage medium | |
WO2020134866A1 (zh) | 关键点检测方法及装置、电子设备和存储介质 | |
WO2020155609A1 (zh) | 一种目标对象处理方法、装置、电子设备及存储介质 | |
WO2021036382A1 (zh) | 图像处理方法及装置、电子设备和存储介质 | |
US11216904B2 (en) | Image processing method and apparatus, electronic device, and storage medium | |
CN110458218B (zh) | 图像分类方法及装置、分类网络训练方法及装置 | |
WO2021208666A1 (zh) | 字符识别方法及装置、电子设备和存储介质 | |
KR20210090238A (ko) | 비디오 처리 방법 및 장치, 전자 기기, 및 기억 매체 | |
CN110532956B (zh) | 图像处理方法及装置、电子设备和存储介质 | |
CN111259967B (zh) | 图像分类及神经网络训练方法、装置、设备及存储介质 | |
TW202127369A (zh) | 網路訓練方法、圖像生成方法、電子設備及電腦可讀儲存介質 | |
CN113065591B (zh) | 目标检测方法及装置、电子设备和存储介质 | |
CN111242303A (zh) | 网络训练方法及装置、图像处理方法及装置 | |
TW202044068A (zh) | 訊息處理方法及其裝置、電子設備和儲存媒體 | |
CN108984628B (zh) | 内容描述生成模型的损失值获取方法及装置 | |
CN111027617A (zh) | 神经网络训练及图像识别方法、装置、设备和存储介质 | |
WO2022141969A1 (zh) | 图像分割方法及装置、电子设备、存储介质和程序 | |
CN111988622B (zh) | 视频预测方法及装置、电子设备和存储介质 | |
KR20210054522A (ko) | 얼굴 인식 방법 및 장치, 전자 기기 및 저장 매체 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19943267 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021537166 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20217023154 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 24.06.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19943267 Country of ref document: EP Kind code of ref document: A1 |