WO2021185121A1 - 模型生成方法、目标检测方法、装置、设备及存储介质 - Google Patents

模型生成方法、目标检测方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021185121A1
WO2021185121A1 PCT/CN2021/079690 CN2021079690W WO2021185121A1 WO 2021185121 A1 WO2021185121 A1 WO 2021185121A1 CN 2021079690 W CN2021079690 W CN 2021079690W WO 2021185121 A1 WO2021185121 A1 WO 2021185121A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection model
pruned
target
model
coefficients
Prior art date
Application number
PCT/CN2021/079690
Other languages
English (en)
French (fr)
Inventor
安耀祖
许新玉
孔旗
Original Assignee
北京京东乾石科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东乾石科技有限公司 filed Critical 北京京东乾石科技有限公司
Priority to KR1020227026698A priority Critical patent/KR20220116061A/ko
Priority to EP21771737.0A priority patent/EP4080408A4/en
Priority to US17/912,342 priority patent/US20230131518A1/en
Priority to JP2022544673A priority patent/JP2023527489A/ja
Publication of WO2021185121A1 publication Critical patent/WO2021185121A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0985Hyperparameter optimisation; Meta-learning; Learning-to-learn
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Definitions

  • the embodiments of the present application relate to the field of computer application technology, for example, to a model generation method, a target detection method, a device, a device, and a storage medium.
  • Object Detection technology is the basis of many computer vision tasks, which can be used to determine whether there is an interesting object to be detected in the image to be detected, and to accurately locate the object to be detected.
  • target detection technology can be combined with target tracking, target re-identification and other technologies to be applied to artificial intelligence systems, vehicle automatic driving systems, intelligent robots, intelligent logistics and other fields.
  • the embodiments of the present application provide a model generation method, a target detection method, a device, a device, and a storage medium, so as to achieve the effect of improving the detection speed of the model by compressing the model.
  • an embodiment of the present application provides a model generation method, which may include:
  • the intermediate detection model is obtained after training the original detection model based on multiple training samples.
  • Each training sample includes a sample image and a sample Sample annotation results of known targets in the image;
  • the channel to be pruned corresponding to the coefficient to be pruned is selected, and the channel to be pruned is pruned to generate the target detection model.
  • an embodiment of the present application also provides a target detection method, which may include:
  • the image to be detected is input into the target detection model, and the target detection result of the target to be detected in the image to be detected is obtained according to the output result of the target detection model.
  • an embodiment of the present application also provides a model generation device, which may include:
  • the first acquisition module is set to acquire multiple scaling coefficients of the batch normalization layer in the intermediate detection model after preliminary training, where the intermediate detection model is obtained after training the original detection model based on multiple training samples, each
  • the training samples include sample images and sample annotation results of known targets in the sample images;
  • the first screening module is set to filter the coefficients to be pruned from the multiple scaling factors according to the numerical value of the multiple scaling factors;
  • the model generation module is set to filter out the channel to be pruned corresponding to the coefficient of pruning from the multiple channels of the intermediate detection model, and perform channel pruning on the channel to be pruned to generate a target detection model.
  • an embodiment of the present application also provides a target detection device, which may include:
  • the second acquisition module is configured to acquire the image to be detected and the target detection model generated according to any one of the above methods
  • the target detection module is configured to input the image to be detected into the target detection model, and obtain the target detection result of the target to be detected in the image to be detected according to the output result of the target detection model.
  • an embodiment of the present application also provides a device, which may include:
  • At least one processor At least one processor
  • Memory set to store at least one program
  • the at least one processor When at least one program is executed by at least one processor, the at least one processor implements the model generation method or the target detection method provided in any embodiment of the present application.
  • an embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the model generation method or target detection provided by any embodiment of the present application is implemented. method.
  • FIG. 1 is a flowchart of a model generation method in Embodiment 1 of the present application
  • Fig. 2 is a flowchart of a model generation method in the second embodiment of the present application.
  • Fig. 3 is a flowchart of a target detection method in the third embodiment of the present application.
  • 4a is a flow chart of model compression in a target detection method in the third embodiment of the present application.
  • 4b is a flow chart of model pruning in a target detection method in the third embodiment of the present application.
  • Fig. 5 is a structural block diagram of a model generating device in the fourth embodiment of the present application.
  • FIG. 6 is a structural block diagram of a target detection device in Embodiment 5 of the present application.
  • FIG. 7 is a schematic structural diagram of a device in Embodiment 6 of the present application.
  • FIG. 8 is a schematic structural diagram of the target detection system in the ninth embodiment of the present application.
  • FIG. 9 is a schematic diagram of the structure of the unmanned vehicle in the tenth embodiment of the present application.
  • FIG. 1 is a flowchart of a model generation method provided in Embodiment 1 of the present application. This embodiment is applicable to the case of compressing the deep learning model in the target detection technology.
  • the method may be executed by the model generation apparatus provided in the embodiment of the present application, and the apparatus may be implemented by at least one of software and hardware, and the apparatus may be integrated on various electronic devices.
  • the method of the embodiment of the present application includes steps S110 to S130.
  • the untrained original detection model is obtained.
  • the original detection model is a deep learning model used for visual detection. It can be divided into anchor-based, anchor-free and the fusion of the two. The difference between them is It is whether to use anchors to extract candidate frames.
  • An anchor can also be called an anchor box, which is a set of rectangular boxes obtained based on the clustering algorithm on the training samples before the model training.
  • the original detection models in anchor-based include fasterRCNN, SSD (Single Shot MultiBox Detector), YoloV2, YoloV3, etc.; the original detection models in anchor-free include CornerNet, ExtremeNet, CenterNet, FCOS, etc. ;
  • the original detection models that integrate anchor-based branches and anchor-free branches include FSAF, SFace, GA-RPN, and so on.
  • SSD is a one-stage (one-stage) detection model. It has no region proposalas stage and directly generates the category probability and position coordinates of the target to be detected. It has a greater advantage in detection speed and can be better Run on unmanned delivery vehicles and mobile terminals. Therefore, as an alternative example, the original detection model may be an SSD, and on this basis, the backbone network of the SSD may be an inception_v3 structure.
  • Each training sample can include a sample image and a sample annotation result of a known target in the sample image.
  • the sample image It can be a frame of image, video sequence, etc., and the sample labeling result can be category probability and location coordinates.
  • BN Batch Normalization
  • the BN layer includes scaling coefficients (gamma coefficients) and offset coefficients (beta coefficients).
  • gamma coefficients scaling coefficients
  • offset coefficients offset coefficients
  • each scaling factor corresponds to a channel in the convolutional layer. For example, if there are 32 scaling factors in a certain BN layer, the convolutional layer immediately adjacent to the BN layer includes 32 channels, and the BN layer also includes 32 channels.
  • each scaling factor is multiplied with the corresponding channel in the convolutional layer, that is, whether a certain scaling factor exists, will directly affect the channel in the corresponding convolutional layer Does it work? Therefore, it is possible to obtain multiple scaling factors of the batch normalization layer in the intermediate detection model, and determine which of the intermediate detection models to perform channel pruning according to the multiple scaling factors.
  • S120 Filter the coefficients to be pruned from the multiple zoom coefficients according to the numerical value of the multiple zoom coefficients.
  • the numerical values of multiple zoom factors can be sorted, and multiple zoom factors can be obtained according to the sorting results.
  • the pruning threshold of multiple scaling factors can be obtained according to the numerical value of multiple scaling factors and the preset pruning rate, and the pruning threshold can be selected from the multiple scaling factors according to the pruning threshold. Pruning coefficient, the coefficient to be pruned may be a scaling factor whose value is less than or equal to the pruning threshold; and so on.
  • multiple scaling factors and multiple channels in a certain convolutional layer are in one-to-one correspondence, and multiple channels in a certain convolutional layer and multiple channels in the BN layer immediately adjacent to the convolutional layer are also There is a one-to-one correspondence. Therefore, the channels to be pruned can be filtered from the multiple channels of the intermediate detection model according to the coefficients to be pruned.
  • the channels to be pruned are those channels with lower importance, and they may be a certain volume.
  • the channel in the build-up layer may also be a channel in a BN layer.
  • the channel to be pruned can be pruned to generate a target detection model, thereby achieving the effect of model compression.
  • channel pruning channel pruning is to simplify the model by deleting redundant channels in the model, which is a structured compression method; moreover, after channel pruning is performed on the channels to be pruned, the channel pruning The convolution kernel corresponding to the channel will also be deleted accordingly, so the amount of convolution operation is also reduced by channel pruning.
  • the BN layer immediately adjacent to it also has 32 channels.
  • Each channel in the BN layer includes a scaling factor and an offset factor.
  • the coefficient to be pruned is derived from the scaling factor. After screening, it is possible to determine which channels in the BN layer are channels to be pruned according to the coefficients to be pruned, and correspondingly, which channels in the convolutional layer are channels to be pruned.
  • the implementation process of the foregoing channel pruning may be: filtering out the current convolution corresponding to the current pruning coefficient among the multiple to-be-pruned coefficients from the multiple channels of the multiple convolutional layers of the intermediate detection model
  • the output channel of the layer and the input channel of the next convolutional layer of the current convolutional layer, and the output channel of the current convolutional layer and the input channel of the next convolutional layer of the current convolutional layer are used as the channel to be pruned . This is because the output channel of the current convolutional layer is the input channel of the next convolutional layer of the current convolutional layer.
  • the output channel of the current convolutional layer is 1-32, then the current convolutional layer The input channel of the next convolutional layer is also 1-32.
  • the output channel 17 of the current convolutional layer corresponding to the current pruning coefficient is the channel to be pruned, then the next convolution of the current convolutional layer The input channel 17 of the layer is also the channel to be pruned.
  • the to-be-pruned branches can be selected from the multiple scaling factors according to the numerical value of the multiple scaling factors.
  • Coefficient because the coefficient to be pruned and the channel to be pruned have a corresponding relationship, the channel to be pruned corresponding to the coefficient to be pruned can be selected from the multiple channels of the intermediate detection model, and the channel to be pruned can be pruned , Generate target detection model.
  • the above technical solution combines the channel pruning with the intermediate detection model, and can perform channel pruning on the intermediate detection model according to the scaling factor in the intermediate detection model completed by the preliminary training, thereby realizing the improvement of the model detection through the compression model The effect of speed.
  • the above model generation method may further include: filtering out a prunable convolutional layer from a plurality of convolutional layers of the intermediate detection model, wherein the prunable convolutional layer includes dividing 1*1 volume The convolutional layer other than the convolutional layer in the accumulation layer and/or the classification regression branch; the scaling factor corresponding to the prunable convolutional layer is selected from a plurality of scaling factors, and the prunable convolutional layer corresponds to The zoom factor is the target zoom factor; accordingly, according to the numerical value of the multiple zoom factors, filtering the coefficient to be pruned from the multiple zoom factors may include: zooming from multiple targets according to the numerical value of the multiple target zoom factors The coefficient to be pruned is selected from the coefficient.
  • the original detection model usually includes two parts: the backbone network and the classification regression branch.
  • the backbone network can be used to extract feature maps.
  • the classification regression branch is a classification branch and a regression branch branched from the backbone network. To classify or regress the extracted feature maps. Since the category of classification regression is usually fixed, the convolutional layer in the classification regression branch can be kept as fixed as possible, which can ensure the fixed output dimension and simplify the execution code. As a result, a convolutional layer other than at least one of the 1*1 convolutional layer and the convolutional layer in the classification regression branch can be used as a prunable convolutional layer, and more than one of the prunable convolutional layers can be used.
  • a target zoom factor filters out the coefficients to be pruned.
  • a pruning detection model can be generated first; then, the pruning detection model can be fine-tuned and trained to generate a target detection model.
  • the simplified pruning detection model after the channel pruning can be fine-tuned and trained, thereby restoring the detection effect, that is, while compressing the model, the original performance of the model is maintained as much as possible.
  • the process of fine-tuning training can be: acquiring historical images and historical annotation results of known targets in historical images, and using historical images and historical annotation results as a set of historical samples; training the pruning detection model based on multiple historical samples , Get the target detection model.
  • the historical sample and the above-mentioned training sample are the same sample data, that is, during fine-tuning training, the historical image and the sample image can be the same image, and the historical labeling result and the sample labeling result can also be The same annotation result.
  • Fig. 2 is a flowchart of a model generation method provided in the second embodiment of the present application. This embodiment is refined on the basis of the above-mentioned technical solutions.
  • the above-mentioned model generation method may further include: obtaining multiple training samples, and performing sparse training on the original detection model based on the batch normalization layer based on the multiple training samples to obtain intermediate detection Model.
  • explanations of terms that are the same as or corresponding to those in the foregoing embodiments will not be repeated here.
  • the method of this embodiment may include steps S210 to S230.
  • an intermediate detection model of the BN layer sparsity can be obtained, that is, sparsity is introduced on the dense connections of the original detection model.
  • an optional solution for sparsity training based on the BN layer is to apply L1 regular constraints on each scaling factor in the original detection model, so that the original detection model adjusts its parameters in the direction of structural sparsity.
  • BN The scaling factor (gamma coefficient) in the layer plays a role equivalent to the switch coefficient of the information flow channel, and controls the switch of the information flow channel to close.
  • the reason for this setting is that in the model training process, the L1 regular constraint is imposed on the scaling factor, and more scaling factors can be adjusted to zero. Therefore, in the model training stage and application stage, because the scaling factor is multiplied with the corresponding channel in the convolutional layer, when more scaling factors are 0, the channels in the corresponding convolutional layer will no longer rise. To any effect, it also plays a role of channel pruning by greatly compressing the zoom factor. On this basis, when filtering the channels to be pruned according to the preset pruning rate, if there are more scaling factors with a value of 0 in the intermediate detection model, the more likely it is that the channels corresponding to the scaling factors with a value other than 0 will be pruned. If it is low, the network structure of the target detection model generated therefrom is more consistent with the network structure of the intermediate detection model. In this way, the detection performance of the two is also more consistent, that is, the effect of model compression is achieved while ensuring the detection performance.
  • the target loss function in the original detection model may be composed of the original loss function and the L1 regular constraint function
  • the L1 regular constraint function may include a loss function that performs L1 regular constraints on multiple scaling factors. That is, on the basis of the original loss function, the L1 regular constraint term of the scaling factor of the BN layer is introduced. In this way, during the training process, the minimum value can be solved according to the objective loss function, and multiple parameter values in the model can be adjusted according to the solution result.
  • the objective loss function L can be expressed by the following formula:
  • x is the sample image
  • y is the sample annotation result of the sample image
  • W is the parameter value in the original detection model
  • f(x, W) is the sample prediction result of the known target in the sample image
  • is the scaling factor
  • l() is the original loss function
  • g() is the L1 regular constraint function
  • represents the set of all scaling coefficients in the original detection model.
  • ⁇ grad ⁇ grad + ⁇ *sign( ⁇ )
  • S220 Obtain multiple scaling coefficients of the batch normalization layer in the intermediate detection model, and filter the coefficients to be pruned from the multiple scaling coefficients according to the numerical value of the multiple scaling coefficients.
  • the technical solution of the embodiment of the present application performs BN layer-based sparsity training on the original detection model based on multiple training samples to obtain a BN layer sparsity intermediate detection model; it can be generated after channel pruning is performed on the intermediate detection model
  • the target detection model is more consistent, and the detection performance of the two is relatively consistent, that is, the effect of model compression is achieved while ensuring the detection performance.
  • FIG. 3 is a flowchart of a target detection method provided in Embodiment 3 of the present application. This embodiment can be applied to a situation where a target detection model generated based on the method described in any of the above embodiments is used to perform target detection on an image to be detected.
  • the method may be executed by the target detection device provided in the embodiment of the present application, the device may be implemented by at least one of software and hardware, and the device may be integrated on various electronic devices.
  • the method of the embodiment of the present application includes steps S310 to S320.
  • the image to be detected may be a frame of image, video sequence, etc.
  • the target detection model may be a visual detection model generated according to the method described in any of the foregoing embodiments.
  • a method for generating a target detection model may be as shown in Figure 4a:
  • the inception_v3 structure is used as the backbone network.
  • L1 regular constraints are imposed on the gamma coefficients in the BN layer next to the convolutional layer. , So that the model adjusts the parameters in the direction of structural sparseness, thus realizing the sparseness of the BN layer.
  • the intermediate detection model after preliminary training can be trimmed according to the preset pruning rate according to the scaling factor of the BN layer. It can streamline the model and increase the detection speed.
  • the above-mentioned channel pruning process can be shown in Figure 4b.
  • the layers are not pruned, which can ensure that the dimensionality of the output remains unchanged.
  • count the gamma coefficients in the BN layer corresponding to the remaining prunable convolutional layers sort all the gamma coefficients calculated, and calculate the pruning threshold of the gamma coefficient according to the preset pruning rate.
  • S320 Input the image to be detected into the target detection model, and obtain the target detection result of the target to be detected in the image to be detected according to the output result of the target detection model.
  • the above-mentioned target detection method can be applied to visual target detection on unmanned delivery vehicles in the field of intelligent logistics.
  • the on-board processors on unmanned delivery vehicles are based on the Xvaier platform, which has relatively limited computing resources, the target detection models involved in the above-mentioned target detection methods are small in scale and fast in detection speed, even if the calculation is limited. Under the constraints of resources, it can still meet the real sense of unmanned operation of unmanned delivery vehicles.
  • the structured pruning operation is implemented at the channel level, and the resulting streamlined model can be directly run on mature frameworks such as Pytorch, MXnet, TensorFlow, etc., or on hardware platforms such as Graphics Processing Unit (GPU) , Field Programmable Gate Array (Field Programmable Gate Array, FPGA), etc., without the support of a special algorithm library, making the application more convenient.
  • mature frameworks such as Pytorch, MXnet, TensorFlow, etc.
  • hardware platforms such as Graphics Processing Unit (GPU) , Field Programmable Gate Array (Field Programmable Gate Array, FPGA), etc.
  • this method is applied to the 5-category subset (car, pedestrian, truck, bus, rider) of the Berkeley DeepDrive (Berkeley DeepDrive, BDD) dataset of the University of California, Berkeley Test the detection accuracy, and the quantitative results are shown in the following two tables. From the data in the table, it can be known that the structured pruning target detection method of the embodiment of the present application can achieve a relatively obvious compression effect while leaving part of the convolutional layer and the BN layer intact. At the same time, the detection result is mAP (Average of 5 types of subsets) There is only a slight drop.
  • mAP Average of 5 types of subsets
  • the technical solution of the embodiment of the present application can perform target detection on the image to be detected based on the generated target detection model. Because the target detection model is a simplified model after model compression, this can effectively improve the detection speed of the target to be detected in the image to be detected. And the original performance of the model can be maintained as much as possible.
  • FIG. 5 is a structural block diagram of a model generation device provided in Embodiment 4 of the application, and the device is configured to execute the model generation method provided in any of the foregoing embodiments.
  • This device and the model generation method of the foregoing embodiments belong to the same inventive concept.
  • the device may include: a first acquisition module 410, a first screening module 420, and a model generation module 430.
  • the first acquisition module 410 is configured to acquire multiple scaling coefficients of the batch normalization layer in the intermediate detection model after preliminary training.
  • the intermediate detection model is obtained after training the original detection model based on multiple training samples.
  • a training sample includes the sample image and the sample annotation result of the known target in the sample image;
  • the first screening module 420 is configured to filter the coefficients to be pruned from the multiple zoom coefficients according to the numerical value of the multiple zoom coefficients;
  • the model generation module 430 is configured to filter the channels to be pruned corresponding to the coefficients to be pruned from the multiple channels of the intermediate detection model, and perform channel pruning on the channels to be pruned to generate the target detection model.
  • the first screening module 420 can be set to:
  • the pruning thresholds of the multiple zoom factors are obtained, and the coefficients to be pruned are selected from the multiple zoom factors according to the pruning threshold.
  • the device may further include:
  • the second screening module is set to filter the prunable convolutional layer from the multiple convolutional layers of the intermediate detection model.
  • the prunable convolutional layer includes the convolutional layer except the 1*1 convolutional layer and the convolutional layer in the classification regression branch A convolutional layer other than at least one of;
  • the third screening module is configured to filter out the scaling factor corresponding to the prunable convolutional layer from a plurality of scaling factors, and the scaling factor corresponding to the prunable convolutional layer is the target scaling factor;
  • the first screening module 420 may be set as:
  • the coefficients to be pruned are selected from the multiple target zoom factors.
  • model generation module 430 may include:
  • the to-be-pruned channel filtering unit is set to filter out the output channels of the current convolutional layer corresponding to the current pruning coefficient among the multiple to-be-pruned coefficients from the multiple channels of the multiple convolutional layers of the intermediate detection model, And the input channel of the next layer of the current convolutional layer, and the output channel and input channel are used as channels to be pruned.
  • the device may further include:
  • the third acquisition module is configured to acquire multiple training samples, and perform sparse training on the original detection model based on the batch normalization layer based on the multiple training samples to obtain an intermediate detection model.
  • the target loss function in the original detection model is composed of an original loss function and an L1 regular constraint function
  • the L1 regular constraint function includes a loss function that performs L1 regular constraints on multiple scaling factors.
  • the objective loss function L is expressed by the following formula:
  • x is the sample image
  • y is the sample annotation result of the sample image
  • W is the parameter value in the original detection model
  • f(x, W) is the sample prediction result of the known target in the sample image
  • is the scaling factor
  • l() is the original loss function
  • g() is the L1 regular constraint function
  • represents the set of all scaling coefficients in the original detection model.
  • model generation module 430 may include:
  • the channel pruning unit is set to perform channel pruning on the channel to be pruned to obtain a pruning detection model
  • the fine-tuning training unit is set to fine-tune the pruning detection model to generate a target detection model.
  • the model generation device obtaineds multiple scaling factors of the batch normalization layer in the preliminary training intermediate detection model through the first obtaining module; the first screening module can be based on the numerical value of the multiple scaling factors, Filter out the coefficients to be pruned from multiple scaling factors; because the model generation module has a corresponding relationship between the coefficients to be pruned and the channels to be pruned, it can filter out the coefficients to be pruned from the multiple channels of the intermediate detection model.
  • the channel to be pruned, and the channel to be pruned is pruned to generate a target detection model.
  • the above device combines the channel pruning with the intermediate detection model, and can perform channel pruning on the intermediate detection model according to the scaling factor in the intermediate detection model completed by the preliminary training, thereby realizing the compression model to improve the detection speed of the model Effect.
  • the model generation device provided in the embodiment of the present application can execute the model generation method provided in any embodiment of the present application, and has functional modules corresponding to the execution method.
  • FIG. 6 is a structural block diagram of a target detection device provided in Embodiment 5 of this application.
  • the device is configured to execute the target detection method provided in any of the foregoing embodiments.
  • This device belongs to the same inventive concept as the target detection method of the foregoing embodiments.
  • the device may include: a second acquisition module 510 and a target detection module 520.
  • the second acquisition module 510 is configured to acquire the image to be detected and the target detection model generated according to any one of the method in the first embodiment and the second embodiment;
  • the target detection module 520 is configured to input the image to be detected into the target detection model, and obtain the target detection result of the target to be detected in the image to be detected according to the output result of the target detection model.
  • the second acquisition module and the target detection module cooperate with each other to perform target detection on the image to be detected based on the generated target detection model.
  • the target detection model is a simplified model after model compression, This can effectively improve the detection speed of the target to be detected in the image to be detected, and the original performance of the model can be maintained as much as possible.
  • the target detection device provided in the embodiment of the present application can execute the target detection method provided in any embodiment of the present application, and has functional modules corresponding to the execution method.
  • the various units and modules included are only divided according to the functional logic, but are not limited to the above-mentioned division, as long as the corresponding function can be realized; in addition, each The names of the functional units are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application.
  • FIG. 7 is a schematic structural diagram of a device provided in Embodiment 6 of this application.
  • the device includes a memory 610, a processor 620, an input device 630, and an output device 640.
  • the number of processors 620 in the device may be at least one.
  • one processor 620 is taken as an example; the memory 610, the processor 620, the input device 630, and the output device 640 in the device may be connected by a bus or other means, as shown in FIG. In 7 the connection via the bus 650 is taken as an example.
  • the memory 610 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the model generation method in the embodiment of the present application (for example, the first part of the model generation device).
  • the processor 620 executes various functional applications and data processing of the device by running software programs, instructions, and modules stored in the memory 610, that is, implements the aforementioned model generation method or target detection method.
  • the memory 610 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the device, and the like.
  • the memory 610 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 610 may include a memory remotely provided with respect to the processor 620, and these remote memories may be connected to the device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 630 may be configured to receive input numeric or character information, and generate key signal input related to user settings and function control of the device.
  • the output device 640 may include a display device such as a display screen.
  • the seventh embodiment of the present application provides a storage medium containing computer-executable instructions, which are used to execute a model generation method when the computer-executable instructions are executed by a computer processor, and the method includes:
  • the intermediate detection model is obtained after training the original detection model based on multiple training samples.
  • Each training sample includes a sample image and a sample Sample annotation results of known targets in the image;
  • the channel to be pruned corresponding to the coefficient to be pruned is screened out, and the channel to be pruned is pruned to generate the target detection model.
  • a storage medium containing computer-executable instructions provided by an embodiment of the present application is not limited to the above-mentioned method operations, and can also execute any of the model generation methods provided in any embodiment of the present application. Related operations.
  • the eighth embodiment of the present application provides a storage medium containing computer-executable instructions, which are used to execute a target detection method when the computer-executable instructions are executed by a computer processor, and the method includes:
  • the image to be detected is input into the target detection model, and the target detection result of the target to be detected in the image to be detected is obtained according to the output result of the target detection model.
  • An embodiment of the present application also provides a target detection system.
  • the system includes a collection device 710, a computing device 720, and a storage device 730.
  • the storage device 730 stores the target detection generated in the first embodiment or the second embodiment.
  • Model the collection device 710 is configured to collect images to be detected
  • the computing device 720 is configured to load the images to be detected and the target detection model, and input the images to be detected into the target detection model, according to the target detection model The output result of to obtain the target detection result of the target to be detected in the image to be detected.
  • the unmanned vehicle includes a driving device 810, a path planning device 820, and the device 830 described in the seventh embodiment.
  • the driving device 810 is configured to drive the The unmanned vehicle runs according to the path planned by the path planning device 820.
  • the device 830 described in the sixth embodiment is set to realize the detection of the target to be detected in the image to be detected, and the path planning device 820 is set as described in the sixth embodiment.
  • the device plans the path of the unmanned vehicle with the detection result of the object to be detected in the image to be detected.
  • this application can be implemented by software and necessary general-purpose hardware, and of course, it can also be implemented by hardware.
  • the technical solution of this application essentially or the part that contributes to the related technology can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, Read-Only Memory (ROM), Random Access Memory (RAM), Flash memory (FLASH), hard disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, A server, or a network device, etc.) execute the method described in each embodiment of the present application.
  • a computer device which can be a personal computer, A server, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

一种模型生成方法、目标检测方法、装置、设备及存储介质。该方法包括:获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,其中,中间检测模型是基于多个训练样本对原始检测模型进行训练后得到的,每个训练样本包括样本图像和样本图像中已知目标的样本标注结果;根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数;从中间检测模型的多个通道中,筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。

Description

模型生成方法、目标检测方法、装置、设备及存储介质
本申请要求在2020年3月17日提交中国专利局、申请号为202010188303.6的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机应用技术领域,例如涉及一种模型生成方法、目标检测方法、装置、设备及存储介质。
背景技术
目标检测(Object Detection)技术是很多计算机视觉任务的基础,其可用于判断待检测图像中是否存在感兴趣的待检测目标,并对待检测目标进行精准定位。而且,目标检测技术可与目标跟踪、目标重识别等技术相结合,应用于人工智能系统、车辆自动驾驶系统、智能机器人、智能物流等领域。
在实现本申请的过程中,发明人发现相关技术中存在以下情况:相关技术中的目标检测技术多是基于深度学习模型实现的,而深度学习模型因尺寸较大容易出现检测速度较慢的问题,这在计算资源受限的设备中尤其明显,这使得目标检测技术很难在实际项目上直接落地应用。
例如,在智能物流领域中,无人配送车的大量应用可以降低配送成本且提高配送效率,而基于视觉的目标检测技术是无人配送车感知周围环境所需的一种非常重要的技术手段。但是,出于量产和成本方面的考虑,无人配送车上的车载处理器多是基于计算资源相对有限的Xvaier平台组成的。由此,应用于这一车载处理器上的深度学习模型的检测速度相对较慢,这将直接影响无人配送车的环境感知能力,进而影响无人配送车的配送效率。因此,如何提升深度学习模型的检测速度,这对智能物流领域的发展至关重要。
发明内容
本申请实施例提供了一种模型生成方法、目标检测方法、装置、设备及存储介质,以实现通过压缩模型来提升模型的检测速度的效果。
第一方面,本申请实施例提供了一种模型生成方法,可以包括:
获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,其中,中间检测模型是基于多个训练样本对原始检测模型进行训练后得到的,每个训练样本包括样本图像和样本图像中已知目标的样本标注结果;
根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数;
从中间检测模型的多个通道中,筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。
第二方面,本申请实施例还提供了一种目标检测方法,可以包括:
获取待检测图像和按照上述任一项的方法生成的目标检测模型;
将待检测图像输入至目标检测模型中,根据目标检测模型的输出结果,得到待检测图像中待检测目标的目标检测结果。
第三方面,本申请实施例还提供了一种模型生成装置,可以包括:
第一获取模块,设置为获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,其中,中间检测模型是基于多个训练样本对原始检测模型进行训练后得到的,每个训练样本包括样本图像和样本图像中已知目标的样本标注结果;
第一筛选模块,设置为根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数;
模型生成模块,设置为从中间检测模型的多个通道中,筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。
第四方面,本申请实施例还提供了一种目标检测装置,可以包括:
第二获取模块,设置为获取待检测图像和按照上述任一项的方法生成的目标检测模型;
目标检测模块,设置为将待检测图像输入至目标检测模型中,根据目标检测模型的输出结果,得到待检测图像中待检测目标的目标检测结果。
第五方面,本申请实施例还提供了一种设备,可以包括:
至少一个处理器;
存储器,设置为存储至少一个程序;
当至少一个程序被至少一个处理器执行,使得至少一个处理器实现本申请任意实施例所提供的模型生成方法或是目标检测方法。
第六方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本申请任意实施例所提供的模型生成方法或是目标检测方法。
附图说明
图1是本申请实施例一中的一种模型生成方法的流程图;
图2是本申请实施例二中的一种模型生成方法的流程图;
图3是本申请实施例三中的一种目标检测方法的流程图;
图4a是本申请实施例三中的一种目标检测方法中模型压缩流程图;
图4b是本申请实施例三中的一种目标检测方法中模型剪枝流程图;
图5是本申请实施例四中的一种模型生成装置的结构框图;
图6是本申请实施例五中的一种目标检测装置的结构框图;
图7是本申请实施例六中的一种设备的结构示意图;
图8是本申请实施例九中的目标检测系统的结构示意图;
图9是本申请实施例十中的无人车的结构示意图。
具体实施方式
实施例一
图1是本申请实施例一中提供的一种模型生成方法的流程图。本实施例可适用于对目标检测技术中的深度学习模型进行压缩的情况。该方法可以由本申请实施例提供的模型生成装置来执行,该装置可以由软件和硬件中的至少一种方式实现,该装置可以集成在各种电子设备上。
参见图1,本申请实施例的方法包括步骤S110至S130。
S110、获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,其中,中间检测模型是基于多个训练样本对原始检测模型进行训练后得到的,,每个训练样本包括样本图像和样本图像中已知目标的样本标注结果。
其中,获取未经训练的原始检测模型,该原始检测模型是深度学习模型中一种用于视觉检测的模型,其可以分为anchor-based、anchor-free和两者融合类,它们的区别就在于是否利用anchor提取候选框。anchor也可称为anchor box,其是在模型训练前,在训练样本上基于聚类算法得到的一组矩形框。
anchor-based中的原始检测模型包括fasterRCNN、SSD(Single Shot MultiBox Detector,单次多框检测器)、YoloV2、YoloV3等等;anchor-free中的原始检测模型包括CornerNet、ExtremeNet、CenterNet、FCOS等等;融合anchor-based分支和anchor-free分支的原始检测模型包括FSAF、SFace、GA-RPN等等。SSD是一种单阶段(one-stage)检测模型,其没有候选区域生成(region proposalas)阶段,直接生成待检测目标的类别概率和位置坐标,在检测速度上具有较大优势,可以更好的在无人配送车、移动终端上运行。由此,一个可选示例,原始检测模型可以是SSD,在此基础上,SSD的主干网络可以是inception_v3结构。
由此,在基于多个训练样本对原始检测模型进行训练后,可以得到初步训练完成的中间检测模型,每个训练样本可以包括样本图像和样本图像中已知目 标的样本标注结果,该样本图像可以是一帧图像、视频序列等等,该样本标注结果可以是类别概率和位置坐标。
需要说明的是,在原始检测模型的每个卷积层的后面均紧邻一个批归一化(Batch Normalization,BN)层,BN层可以对每个卷积层的输出结果的尺度进行归一化,这可以在训练过程中避免出现梯度损失和梯度溢出的情况。BN层包括缩放系数(gamma系数)和偏移系数(beta系数),其中,在每个BN层中,缩放系数的个数与该BN层紧邻的卷积层中的通道个数是一致的,即每个缩放系数对应卷积层中的一个通道。例如,若某个BN层中有32个缩放系数,则与该BN层紧邻的卷积层中包括32个通道,且该BN层亦包括32个通道。而且,在原始检测模型的训练阶段和应用阶段,每个缩放系数是与卷积层中的对应通道进行乘法运算,即某个缩放系数是否存在,将直接影响与其对应的卷积层中的通道是否起到作用。因此,可以获取中间检测模型中批归一化层的多个缩放系数,并根据该多个缩放系数确定对中间检测模型中的哪些进行通道剪枝。
S120、根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数。
其中,根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数的实现方式有多种,比如,可以对多个缩放系数的数值大小进行排序,根据排序结果得到多个缩放系数的中值,并根据该中值从多个缩放系数中筛选出待剪枝系数;比如,可以计算出多个缩放系数的数值大小的均值,并根据该均值从多个缩放系数中筛选出待剪枝系数;再比如,可以根据多个缩放系数的数值大小和预设剪枝率,得出多个缩放系数的剪枝阈值,并根据剪枝阈值从多个缩放系数中筛选出待剪枝系数,该待剪枝系数可以是数值小于或等于剪枝阈值的缩放系数;等等。
S130、从中间检测模型的多个通道中,筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。
其中,多个缩放系数和某个卷积层中的多个通道是一一对应的,且某个卷积层中的多个通道和与该卷积层紧邻的BN层中的多个通道也是一一对应的,由此,根据待剪枝系数可以从中间检测模型的多个通道中筛选出待剪枝通道,该待剪枝通道是那些重要性较低的通道,它们可能是某个卷积层中的通道,也可能是某个BN层中的通道。
可以对待剪枝通道进行通道剪枝,生成目标检测模型,由此实现了模型压缩的效果。其中,通道剪枝(channel pruning)是通过删减模型中冗余通道的方式来简化模型,是一种结构化压缩方式;而且,在对待剪枝通道进行通道剪枝后,与这些待剪枝通道对应的卷积核也会相应删除,因此通过通道剪枝也减小卷积的运算量。示例性的,如果某个卷积层是32个通道,则与其紧邻的BN层也是32个通道,BN层中的每个通道包括缩放系数和偏移系数,待剪枝系数是从缩放系数中筛选出来的,由此,根据待剪枝系数可以确定BN层中的哪些通道是待剪枝通道,相应的可以确定卷积层中的哪些通道是待剪枝通道。
可选的,上述通道剪枝的实现过程可以是:从中间检测模型的多个卷积层的多个通道中,筛选出与多个待剪枝系数中的当前剪枝系数对应的当前卷积层的输出通道,以及当前卷积层的下一层卷积层的输入通道,并将当前卷积层的输出通道和当前卷积层的下一层卷积层的输入通道作为待剪枝通道。这是因为,当前卷积层的输出通道是当前卷积层的下一层卷积层的输入通道,示例性的,若当前卷积层的输出通道是1-32,则当前卷积层的下一层卷积层的输入通道也是1-32,此时,若与当前剪枝系数对应的当前卷积层的输出通道17是待剪枝通道,则当前卷积层的下一层卷积层的输入通道17也是待剪枝通道。
本申请实施例的技术方案,通过获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,可以根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数;因待剪枝系数和待剪枝通道具有对应关系,则可以从中间检测模型的多个通道中筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。上述技术方案,将通道剪枝与中间检测模型相结合,可以根据初步训练完成的中间检测模型中的缩放系数,对中间检测模型进行通道剪枝,由此实现了通过压缩模型来提升模型的检测速度的效果。
一种可选的技术方案,上述模型生成方法还可以包括:从中间检测模型的多个卷积层中筛选出可剪枝卷积层,其中,可剪枝卷积层包括除1*1卷积层和/或分类回归分支中的卷积层之外的卷积层;从多个缩放系数中筛选出与可剪枝卷积层对应的缩放系数,所述可剪枝卷积层对应的缩放系数为目标缩放系数;相应的,根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数,可以包括:根据多个目标缩放系数的数值大小,从多个目标缩放系数中筛选出待剪枝系数。
其中,通常情况下,原始检测模型通常包括主干网络和分类回归分支这两部分,主干网络可以用于提取特征图,分类回归分支是从主干网络中分支出的一个分类分支和一个回归分支,可用于对已提取的特征图进行分类或是回归。由于分类回归的类别通常是固定的,因此可尽可能保持分类回归分支中卷积层的固定,这样可以确保输出维度的固定,且可以简化执行代码。由此,可以将1*1卷积层和分类回归分支中的卷积层中的至少之一之外的卷积层作为可剪枝卷积层,并从可剪枝卷积层中的多个目标缩放系数筛选出待剪枝系数。
一种可选的技术方案,在对待剪枝通道进行通道剪枝后,可以先生成剪枝 检测模型;然后,再对剪枝检测模型进行微调训练,生成目标检测模型。也就是说,可以对通道剪枝后的精简的剪枝检测模型进行微调训练,由此恢复检测效果,即在压缩模型的同时,尽可能保持了模型的原始性能。其中,微调训练的过程可以是:获取历史图像和历史图像中已知目标的历史标注结果,并将历史图像和历史标注结果作为一组历史样本;基于多个历史样本对剪枝检测模型进行训练,得到目标检测模型。需要说明的是,通常情况下,该历史样本和上述训练样本是相同的样本数据,即在微调训练时,历史图像和样本图像可以是相同的图像,且历史标注结果和样本标注结果也可以是相同的标注结果。
实施例二
图2是本申请实施例二中提供的一种模型生成方法的流程图。本实施例以上述各技术方案为基础进行细化。在本实施例中,可选的,上述模型生成方法,还可以包括:获取多个训练样本,并基于多个训练样本对原始检测模型进行基于批归一化层的稀疏化训练,得到中间检测模型。其中,与上述各实施例相同或相应的术语的解释在此不再赘述。
参见图2,本实施例的方法可以包括步骤S210至S230。
S210、获取多个训练样本,并基于多个训练样本对原始检测模型进行基于批归一化层的稀疏化训练,得到中间检测模型,其中,每个训练样本包括样本图像和样本图像中已知目标的样本标注结果。
其中,在对原始检测模型进行基于BN层的稀疏化训练时,可得到一个BN层稀疏化的中间检测模型,即在原始检测模型的稠密连接上引入稀疏化。示例性的,基于BN层的稀疏化训练的一个可选方案是,对原始检测模型中的每个缩放系数施加L1正则约束,使得原始检测模型朝着结构性稀疏的方向调整参数,此时BN层中的缩放系数(gamma系数)所起的作用相当于信息流通道的开关 系数,控制着信息流通道的开关闭合。
这样设置的原因在于,在模型训练过程中,对缩放系数施加L1正则约束,可以将更多的缩放系数调整为0。由此,在模型训练阶段和应用阶段,因缩放系数是与卷积层中的对应通道进行乘法运算,则当更多的缩放系数为0时,与其对应的卷积层中的通道将不再起到任何作用,即通过对缩放系数进行大幅压缩的方式也起到了通道剪枝的作用。在此基础上,在根据预设剪枝率筛选待剪枝通道时,若中间检测模型中数值为0的缩放系数越多,则将数值非0的缩放系数对应的通道剪掉的可能性越低,由此生成的目标检测模型的网络结构和中间检测模型的网络结构越一致,这样一来,两者的检测性能也越一致,即在保证检测性能的同时,实现了模型压缩的效果。
在此基础上,原始检测模型中的目标损失函数可以由原始损失函数和L1正则约束函数构成,L1正则约束函数可以包括对多个缩放系数进行L1正则约束的损失函数。即,在原始损失函数的基础上,引入了BN层的缩放系数的L1正则约束项。这样一来,在训练过程中可以根据目标损失函数求解极小值,并根据求解结果调整模型中的多个参数值。
可选的,目标损失函数L可以通过如下公式表示:
Figure PCTCN2021079690-appb-000001
其中,x是样本图像,y是样本图像的样本标注结果,W是原始检测模型中的参数值,f(x,W)是样本图像中已知目标的样本预测结果,γ是缩放系数,λ是惩罚系数,l()是原始损失函数,g()是L1正则约束函数,Γ表示所述原始检测模型中所有缩放系数的集合。而且,因为L1正则约束函数仅施加于BN层的缩放系数,因此在反向传播更新梯度时,缩放系数的梯度γ grad需要加上一个[缩放系数符 号sign(γ)与惩罚系数λ的乘积项],其公式如下所示:
γ grad=γ grad+λ*sign(γ)
S220、获取中间检测模型中批归一化层的多个缩放系数,并根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数。
S230、从中间检测模型的多个通道中,筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。
本申请实施例的技术方案,基于多个训练样本对原始检测模型进行基于BN层的稀疏化训练,可以得到一个BN层稀疏化的中间检测模型;在对中间检测模型进行通道剪枝后可生成的目标检测模型,且二者的检测性能较为一致,即在保证检测性能的同时,实现了模型压缩的效果。
实施例三
图3是本申请实施例三中提供的一种目标检测方法的流程图。本实施例可适用于基于上述任意实施例所述的方法生成的目标检测模型,对待检测图像进行目标检测的情况。该方法可以由本申请实施例提供的目标检测装置来执行,该装置可以由软件和硬件中的至少之一的方式实现,该装置可以集成在各种电子设备上。
参见图3,本申请实施例的方法包括步骤S310至S320。
S310、获取待检测图像和按照上述任意实施例的方法生成的目标检测模型。
其中,该待检测图像可以是一帧图像、视频序列等等,该目标检测模型可以是按照上述任意实施例所述的方法生成的是视觉检测模型。示例性的,目标检测模型的一种生成方法可以如图4a所示:
首先,以原始SSD检测模型为基础,以inception_v3结构为主干网络,在 模型训练期间,在保持原有参数设置的情况下,通过对与卷积层紧邻的BN层中的gamma系数施加L1正则约束,使得模型朝着结构性稀疏的方向调整参数,由此实现了BN层的稀疏化。其次,在完成基于BN层的稀疏化训练后,可以根据BN层的缩放系数,对初步训练完成的中间检测模型按照预设剪枝率裁剪对应的卷积层和BN层中的通道,由此能够精简模型并提升一定的检测速度。最后,对通道剪枝后的精简模型进行微调训练,恢复检测效果。
上述通道剪枝过程可以如图4b所示,首先,需要确定对哪些卷积层进行通道剪枝,在本申请实施例中有两个限制:1*1卷积层和分类回归分支中的卷积层均不剪枝,这可以确保输出的维度不变。其次,统计其余的可剪枝卷积层对应的BN层中的gamma系数,并对统计出的全部gamma系数进行排序,按照预设剪枝率计算出gamma系数的剪枝阈值。再次,根据剪枝阈值对卷积层和BN层中的各个通道进行选取,保留那些数值大于剪枝阈值的gamma系数对应的通道,由此确定出BN层以及与BN层紧邻的卷积层中可保留的通道的掩模(MASK)。最后,根据MASK保留卷积层和BN层中对应的通道,并剪枝掉那些未保留的通道。
S320、将待检测图像输入至目标检测模型中,根据目标检测模型的输出结果,得到待检测图像中待检测目标的目标检测结果。
一个可选示例,以背景技术中的例子为例,上述目标检测方法可以应用于智能物流领域的无人配送车上的视觉目标检测。虽然无人配送车上的车载处理器多是基于计算资源相对有限的Xvaier平台组成的,但是,上述目标检测方法中涉及到的目标检测模型的规模较小且检测速度较快,即使受限计算资源的约束下,依然可以满足无人配送车的真正意义上的脱人运行。而且,在通道级上实施结构化剪枝操作,由此生成的精简模型能够直接运行在成熟框架比如 Pytorch、MXnet、TensorFlow等等,或是硬件平台上比如图形处理器(Graphics Processing Unit,GPU)、现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)等等,无需特殊算法库的支持,应用起来更加便捷。
为验证上述目标检测方法的检测精度,将这一方法在加州大学伯克利DeepDrive深度学习自动驾驶产业联盟(Berkeley DeepDrive,BDD)数据集的5类子集(car,pedestrian,truck,bus,rider)上测试检测精度,其量化结果如下述两个表格所示。从表中数据可以得知,本申请实施例的结构化剪枝的目标检测方法,在保留部分卷积层和BN层不动的情况下,能够达到比较明显的压缩效果,同时其检测结果mAP(5类子集的平均值)只有很微小的下降。
Figure PCTCN2021079690-appb-000002
Figure PCTCN2021079690-appb-000003
本申请实施例的技术方案,可以基于已生成的目标检测模型对待检测图像进行目标检测,因目标检测模型是模型压缩后的精简模型,这可有效提升待检测图像中待检测目标的检测速度,且可尽可能保持模型的原始性能。
实施例四
图5为本申请实施例四提供的模型生成装置的结构框图,该装置设置为执行上述任意实施例所提供的模型生成方法。该装置与上述各实施例的模型生成方法属于同一个发明构思,在模型生成装置的实施例中未详尽描述的细节内容,可以参考上述模型生成方法的实施例。参见图5,该装置可包括:第一获取模块410、第一筛选模块420和模型生成模块430。
其中,第一获取模块410,设置为获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,中间检测模型是基于多个训练样本对原始检测模型进行训练后得到的,每个训练样本包括样本图像和样本图像中已知目标的样本标注结果;
第一筛选模块420,设置为根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数;
模型生成模块430,设置为从中间检测模型的多个通道中,筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。
可选的,第一筛选模块420,可以设置为:
根据多个缩放系数的数值大小和预设剪枝率,得出多个缩放系数的剪枝阈值,并根据剪枝阈值从多个缩放系数中筛选出待剪枝系数。
可选的,在上述装置的基础上,该装置还可以包括:
第二筛选模块,设置为从中间检测模型的多个卷积层中筛选出可剪枝卷积层,可剪枝卷积层包括除1*1卷积层和分类回归分支中的卷积层中的至少之一 之外的卷积层;
第三筛选模块,设置为从多个缩放系数中筛选出与可剪枝卷积层对应的缩放系数,所述可剪枝卷积层对应的缩放系数为目标缩放系数;
第一筛选模块420,可以设置为:
根据多个目标缩放系数的数值大小,从多个目标缩放系数中筛选出待剪枝系数。
可选的,模型生成模块430,可以包括:
待剪枝通道筛选单元,设置为从中间检测模型的多个卷积层的多个通道中,筛选出与多个待剪枝系数中的当前剪枝系数对应的当前卷积层的输出通道,以及当前卷积层的下一层卷积层的输入通道,并将输出通道和输入通道作为待剪枝通道。
可选的,在上述装置的基础上,该装置还可以包括:
第三获取模块,设置为获取多个训练样本,并基于多个训练样本对原始检测模型进行基于批归一化层的稀疏化训练,得到中间检测模型。
可选的,原始检测模型中的目标损失函数由原始损失函数和L1正则约束函数构成,L1正则约束函数包括对多个缩放系数进行L1正则约束的损失函数。
可选的,目标损失函数L通过如下公式表示:
Figure PCTCN2021079690-appb-000004
其中,x是样本图像,y是样本图像的样本标注结果,W是原始检测模型中的参数值,f(x,W)是样本图像中已知目标的样本预测结果,γ是缩放系数,λ是惩罚系数,l()是原始损失函数,g()是L1正则约束函数,Γ表示所述原始检测模型中所有缩放系数的集合。
可选的,模型生成模块430,可以包括:
通道剪枝单元,设置为对待剪枝通道进行通道剪枝,得到剪枝检测模型;
微调训练单元,设置为对剪枝检测模型进行微调训练,生成目标检测模型。
本申请实施例四提供的模型生成装置,通过第一获取模块获取初步训练完成的中间检测模型中批归一化层的多个缩放系数;第一筛选模块可以根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数;模型生成模块因待剪枝系数和待剪枝通道具有对应关系,则可以从中间检测模型的多个通道中筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。上述装置,将通道剪枝与中间检测模型相结合,可以根据初步训练完成的中间检测模型中的缩放系数,对中间检测模型进行通道剪枝,由此实现了通过压缩模型来提升模型的检测速度的效果。
本申请实施例所提供的模型生成装置可执行本申请任意实施例所提供的模型生成方法,具备执行方法相应的功能模块。
值得注意的是,上述模型生成装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的名称也只是为了便于相互区分,并不用于限制本申请的保护范围。
实施例五
图6为本申请实施例五提供的目标检测装置的结构框图,该装置设置为执行上述任意实施例所提供的目标检测方法。该装置与上述各实施例的目标检测方法属于同一个发明构思,在目标检测装置的实施例中未详尽描述的细节内容,可以参考上述目标检测方法的实施例。参见图6,该装置可包括:第二获取模块510和目标检测模块520。
其中,第二获取模块510,设置为获取待检测图像和按照实施例一和实施例二中的任一项的方法生成的目标检测模型;
目标检测模块520,设置为将待检测图像输入至目标检测模型中,根据目标检测模型的输出结果,得到待检测图像中待检测目标的目标检测结果。
本申请实施例五提供的目标检测装置,通过第二获取模块和目标检测模块相互配合,可以基于已生成的目标检测模型对待检测图像进行目标检测,因目标检测模型是模型压缩后的精简模型,这可有效提升待检测图像中待检测目标的检测速度,且可尽可能保持模型的原始性能。
本申请实施例所提供的目标检测装置可执行本申请任意实施例所提供的目标检测方法,具备执行方法相应的功能模块。
值得注意的是,上述目标检测装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的名称也只是为了便于相互区分,并不用于限制本申请的保护范围。
实施例六
图7为本申请实施例六提供的一种设备的结构示意图,如图7所示,该设备包括存储器610、处理器620、输入装置630和输出装置640。设备中的处理器620的数量可以是至少一个,图7中以一个处理器620为例;设备中的存储器610、处理器620、输入装置630和输出装置640可以通过总线或其它方式连接,图7中以通过总线650连接为例。
存储器610作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的模型生成方法对应的程序指令/模块(例如,模型生成装置中的第一获取模块410、第一筛选模块420和模型生成 模块430),或者,如本申请实施例中的目标检测方法对应的程序指令/模块(例如,目标检测装置中的第二获取模块510和目标检测模块520)。处理器620通过运行存储在存储器610中的软件程序、指令以及模块,从而执行设备的各种功能应用以及数据处理,即实现上述的模型生成方法或是目标检测方法。
存储器610可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据设备的使用所创建的数据等。此外,存储器610可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器610可包括相对于处理器620远程设置的存储器,这些远程存储器可以通过网络连接至设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置630可设置为接收输入的数字或字符信息,以及产生与装置的用户设置以及功能控制有关的键信号输入。输出装置640可包括显示屏等显示设备。
实施例七
本申请实施例七提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种模型生成方法,该方法包括:
获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,其中,中间检测模型是基于多个训练样本对原始检测模型进行训练后得到的,每个训练样本包括样本图像和样本图像中已知目标的样本标注结果;
根据多个缩放系数的数值大小,从多个缩放系数中筛选出待剪枝系数;
从中间检测模型的多个通道中,筛选出与待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。
当然,本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的方法操作,还可以执行本申请任意实施例所提供的模型生成方法中的相关操作。
实施例八
本申请实施例八提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种目标检测方法,该方法包括:
获取待检测图像和按照上述实施例一和实施例二中的任一项所述的方法生成的目标检测模型;
将待检测图像输入至目标检测模型中,根据目标检测模型的输出结果,得到待检测图像中待检测目标的目标检测结果。
实施例九
本申请实施例还提供一种目标检测系统,参见图8,该系统包括采集设备710、计算设备720及存储设备730,所述存储设备730存储有实施例一或实施例二中生成的目标检测模型,所述采集设备710设置为采集待检测图像,所述计算设备720设置为加载所述待检测图像和目标检测模型,并将所述待检测图像输入至目标检测模型中,根据目标检测模型的输出结果,得到所述待检测图像中待检测目标的目标检测结果。
实施例十
本申请实施例还提供了一种无人车,参见图9,该无人车包括驱动设备810、路径规划设备820以及实施例七所述的设备830,所述驱动设备810设置为驱动所述无人车按照所述路径规划设备820规划的路径运行,实施例六所述的设备830设置为实现对待检测图像中的待检测目标的检测,路径规划设备820设置为根据实施例六所述的设备对待检测图像中的待检测目标的检测结果对所述无人 车的路径进行规划。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本申请可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,。依据这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。

Claims (17)

  1. 一种模型生成方法,包括:
    获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,其中,所述中间检测模型是基于多个训练样本对原始检测模型进行训练后得到的,每个训练样本包括样本图像和所述样本图像中已知目标的样本标注结果;
    根据多个所述缩放系数的数值大小,从多个所述缩放系数中筛选出待剪枝系数;
    从所述中间检测模型的多个通道中,筛选出与所述待剪枝系数对应的待剪枝通道,并对所述待剪枝通道进行通道剪枝,生成目标检测模型。
  2. 根据权利要求1所述的方法,其中,所述根据多个所述缩放系数的数值大小,从多个所述缩放系数中筛选出待剪枝系数,包括:
    根据多个所述缩放系数的数值大小和预设剪枝率,得出多个所述缩放系数的剪枝阈值,并根据所述剪枝阈值从多个所述缩放系数中筛选出待剪枝系数。
  3. 根据权利要求1所述的方法,还包括:
    从所述中间检测模型的多个卷积层中筛选出可剪枝卷积层,其中,所述可剪枝卷积层包括除1*1卷积层和分类回归分支中的卷积层中的至少之一之外的卷积层;
    从多个所述缩放系数中筛选出与所述可剪枝卷积层对应的缩放系数,所述可剪枝卷积层对应的缩放系数为目标缩放系数;
    所述根据多个所述缩放系数的数值大小,从多个所述缩放系数中筛选出待剪枝系数,包括:根据多个所述目标缩放系数的数值大小,从多个所述目标缩放系数中筛选出待剪枝系数。
  4. 根据权利要求1所述的方法,其中,所述待剪枝系数为多个,所述从所述中间检测模型的多个通道中,筛选出与所述待剪枝系数对应的待剪枝通道,包括:
    从所述中间检测模型的多个卷积层的多个通道中,筛选出与多个所述待剪枝系数中的当前剪枝系数对应的当前卷积层的输出通道,以及所述当前卷积层 的下一层卷积层的输入通道,并将所述输出通道和所述输入通道作为待剪枝通道。
  5. 根据权利要求1所述的方法,还包括:
    获取多个所述训练样本,并基于多个所述训练样本对所述原始检测模型进行基于所述批归一化层的稀疏化训练,得到所述中间检测模型。
  6. 根据权利要求5所述的方法,其中,所述原始检测模型中的目标损失函数由原始损失函数和L1正则约束函数构成,所述L1正则约束函数包括对多个所述缩放系数进行L1正则约束的损失函数。
  7. 根据权利要求6所述的方法,其中,所述目标损失函数L通过如下公式表示:
    Figure PCTCN2021079690-appb-100001
    其中,x是一样本图像,y是所述样本图像中的样本标注结果,W是所述原始检测模型中的参数值,f(x,W)是所述样本图像中已知目标的样本预测结果,γ是缩放系数,λ是惩罚系数,l()是所述原始损失函数,g()是所述L1正则约束函数,Γ表示所述原始检测模型中所有缩放系数的集合。
  8. 根据权利要求1所述的方法,其中,所述对所述待剪枝通道进行通道剪枝,生成目标检测模型,包括:
    对所述待剪枝通道进行通道剪枝,得到剪枝检测模型;
    对所述剪枝检测模型进行微调训练,生成目标检测模型。
  9. 根据权利要求1所述的方法,其中,所述原始检测模型包括单次多框检测器SSD。
  10. 根据权利要求9所述的方法,其中,所述SSD的主干网络包括inception_v3结构。
  11. 一种目标检测方法,包括:
    获取待检测图像和按照权利要求1-10中任一项的方法生成的目标检测模 型;
    将所述待检测图像输入至所述目标检测模型中,根据所述目标检测模型的输出结果,得到所述待检测图像中待检测目标的目标检测结果。
  12. 一种模型生成装置,包括:
    第一获取模块,设置为获取初步训练完成的中间检测模型中批归一化层的多个缩放系数,其中,所述中间检测模型是基于多个训练样本对原始检测模型进行训练后得到的,每个训练样本包括样本图像和所述样本图像中已知目标的样本标注结果;
    第一筛选模块,设置为根据多个所述缩放系数的数值大小,从多个所述缩放系数中筛选出待剪枝系数;
    模型生成模块,设置为从所述中间检测模型的多个通道中,筛选出与所述待剪枝系数对应的待剪枝通道,并对待剪枝通道进行通道剪枝,生成目标检测模型。
  13. 一种目标检测装置,包括:
    第二获取模块,设置为获取待检测图像和按照权利要求1-10中任一项的方法生成的目标检测模型;
    目标检测模块,设置为将所述待检测图像输入至所述目标检测模型中,根据所述目标检测模型的输出结果,得到待检测图像中待检测目标的目标检测结果。
  14. 一种设备,包括:
    至少一个处理器;
    存储器,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-10中任一所述的模型生成方法。
  15. 一种设备,包括:
    至少一个处理器;
    存储器,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求11中所述的目标检测方法。
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时,实现如权利要求1-10中任一所述的模型生成方法,或者如权利要求11中所述的目标检测方法。
  17. 一种无人车,包括驱动设备、路径规划设备以及权利要求15所述的设备,所述驱动设备设置为驱动所述无人车按照所述路径规划设备规划的路径运行,权利要求15所述的设备设置为实现待检测图像中的待检测目标的检测,所述路径规划设备设置为根据权利要求15所述的设备对待检测图像中的待检测目标的检测结果对所述无人车的路径进行规划。
PCT/CN2021/079690 2020-03-17 2021-03-09 模型生成方法、目标检测方法、装置、设备及存储介质 WO2021185121A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020227026698A KR20220116061A (ko) 2020-03-17 2021-03-09 모델 생성 방법, 오브젝트 검출 방법, 장치, 기기 및 저장매체
EP21771737.0A EP4080408A4 (en) 2020-03-17 2021-03-09 MODEL GENERATION METHOD AND APPARATUS, OBJECT DETECTION METHOD AND APPARATUS, APPARATUS, AND STORAGE MEDIUM
US17/912,342 US20230131518A1 (en) 2020-03-17 2021-03-09 Model Generation Method and Apparatus, Object Detection Method and Apparatus, Device, and Storage Medium
JP2022544673A JP2023527489A (ja) 2020-03-17 2021-03-09 モデル生成方法、オブジェクト検出方法、装置、機器、及び記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010188303.6A CN113408561A (zh) 2020-03-17 2020-03-17 模型生成方法、目标检测方法、装置、设备及存储介质
CN202010188303.6 2020-03-17

Publications (1)

Publication Number Publication Date
WO2021185121A1 true WO2021185121A1 (zh) 2021-09-23

Family

ID=77677171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/079690 WO2021185121A1 (zh) 2020-03-17 2021-03-09 模型生成方法、目标检测方法、装置、设备及存储介质

Country Status (6)

Country Link
US (1) US20230131518A1 (zh)
EP (1) EP4080408A4 (zh)
JP (1) JP2023527489A (zh)
KR (1) KR20220116061A (zh)
CN (1) CN113408561A (zh)
WO (1) WO2021185121A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170902A (zh) * 2022-06-20 2022-10-11 美的集团(上海)有限公司 图像处理模型的训练方法
CN115265881A (zh) * 2022-09-28 2022-11-01 宁波普瑞均胜汽车电子有限公司 压力检测方法和装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115552477A (zh) * 2020-05-01 2022-12-30 奇跃公司 采用施加的分层归一化的图像描述符网络
CN115169556B (zh) * 2022-07-25 2023-08-04 美的集团(上海)有限公司 模型剪枝方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368885A (zh) * 2017-07-13 2017-11-21 北京智芯原动科技有限公司 基于多粒度剪枝的网络模型压缩方法及装置
CN107728620A (zh) * 2017-10-18 2018-02-23 江苏卡威汽车工业集团股份有限公司 一种新能源汽车的无人驾驶系统及方法
CN109344921A (zh) * 2019-01-03 2019-02-15 湖南极点智能科技有限公司 一种基于深度神经网络模型的图像识别方法、装置及设备
CN110263841A (zh) * 2019-06-14 2019-09-20 南京信息工程大学 一种基于滤波器注意力机制和bn层缩放系数的动态结构化网络剪枝方法
CN111062382A (zh) * 2019-10-30 2020-04-24 北京交通大学 用于目标检测网络的通道剪枝方法
CN111325342A (zh) * 2020-02-19 2020-06-23 深圳中兴网信科技有限公司 模型的压缩方法、装置、目标检测设备和存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6811736B2 (ja) * 2018-03-12 2021-01-13 Kddi株式会社 情報処理装置、情報処理方法、及びプログラム
US10936913B2 (en) * 2018-03-20 2021-03-02 The Regents Of The University Of Michigan Automatic filter pruning technique for convolutional neural networks
CN108764046A (zh) * 2018-04-26 2018-11-06 平安科技(深圳)有限公司 车辆损伤分类模型的生成装置、方法及计算机可读存储介质
JP7047612B2 (ja) * 2018-06-08 2022-04-05 沖電気工業株式会社 ニューラルネットワーク軽量化装置、情報処理装置、ニューラルネットワーク軽量化方法およびプログラム
CN110084181B (zh) * 2019-04-24 2021-04-20 哈尔滨工业大学 一种基于稀疏MobileNetV2网络的遥感图像舰船目标检测方法
CN110633747A (zh) * 2019-09-12 2019-12-31 网易(杭州)网络有限公司 目标检测器的压缩方法、装置、介质以及电子设备
CN110619391B (zh) * 2019-09-19 2023-04-18 华南理工大学 一种检测模型压缩方法、装置和计算机可读存储介质
CN110796168B (zh) * 2019-09-26 2023-06-13 江苏大学 一种基于改进YOLOv3的车辆检测方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368885A (zh) * 2017-07-13 2017-11-21 北京智芯原动科技有限公司 基于多粒度剪枝的网络模型压缩方法及装置
CN107728620A (zh) * 2017-10-18 2018-02-23 江苏卡威汽车工业集团股份有限公司 一种新能源汽车的无人驾驶系统及方法
CN109344921A (zh) * 2019-01-03 2019-02-15 湖南极点智能科技有限公司 一种基于深度神经网络模型的图像识别方法、装置及设备
CN110263841A (zh) * 2019-06-14 2019-09-20 南京信息工程大学 一种基于滤波器注意力机制和bn层缩放系数的动态结构化网络剪枝方法
CN111062382A (zh) * 2019-10-30 2020-04-24 北京交通大学 用于目标检测网络的通道剪枝方法
CN111325342A (zh) * 2020-02-19 2020-06-23 深圳中兴网信科技有限公司 模型的压缩方法、装置、目标检测设备和存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP4080408A4
SONG, FEIYANG ET AL.: "MobileNetV3 (Optimization of Structural Pruning Based on MobileNetV3", AUTOMATION & INFORMATION ENGINEERING, vol. 40, no. 6, 15 December 2019 (2019-12-15), pages 20 - 25, XP055852229 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170902A (zh) * 2022-06-20 2022-10-11 美的集团(上海)有限公司 图像处理模型的训练方法
CN115170902B (zh) * 2022-06-20 2024-03-08 美的集团(上海)有限公司 图像处理模型的训练方法
CN115265881A (zh) * 2022-09-28 2022-11-01 宁波普瑞均胜汽车电子有限公司 压力检测方法和装置
CN115265881B (zh) * 2022-09-28 2022-12-20 宁波普瑞均胜汽车电子有限公司 压力检测方法和装置

Also Published As

Publication number Publication date
CN113408561A (zh) 2021-09-17
EP4080408A1 (en) 2022-10-26
EP4080408A4 (en) 2023-12-27
JP2023527489A (ja) 2023-06-29
US20230131518A1 (en) 2023-04-27
KR20220116061A (ko) 2022-08-19

Similar Documents

Publication Publication Date Title
WO2021185121A1 (zh) 模型生成方法、目标检测方法、装置、设备及存储介质
CN110991311B (zh) 一种基于密集连接深度网络的目标检测方法
CN110929577A (zh) 一种基于YOLOv3的轻量级框架改进的目标识别方法
CN111461083A (zh) 基于深度学习的快速车辆检测方法
CN111507226B (zh) 道路图像识别模型建模方法、图像识别方法及电子设备
CN114898327B (zh) 一种基于轻量化深度学习网络的车辆检测方法
CN110458047B (zh) 一种基于深度学习的越野环境场景识别方法及系统
CN112949578B (zh) 车灯状态识别方法、装置、设备及存储介质
CN110599453A (zh) 一种基于图像融合的面板缺陷检测方法、装置及设备终端
CN115830399B (zh) 分类模型训练方法、装置、设备、存储介质和程序产品
CN115861619A (zh) 一种递归残差双注意力核点卷积网络的机载LiDAR城市点云语义分割方法与系统
CN110909674A (zh) 一种交通标志识别方法、装置、设备和存储介质
CN112785610B (zh) 一种融合低层特征的车道线语义分割方法
CN114495060A (zh) 一种道路交通标线识别方法及装置
CN113569911A (zh) 车辆识别方法、装置、电子设备及存储介质
CN116051961A (zh) 一种目标检测模型训练方法、目标检测方法、设备及介质
CN116129158A (zh) 一种输电线路铁塔小部件图像识别方法及装置
CN115240163A (zh) 一种基于一阶段检测网络的交通标志检测方法及系统
CN114792397A (zh) 一种sar影像城市道路提取方法、系统以及存储介质
CN114694080A (zh) 一种监控暴力行为检测方法、系统、装置及可读存储介质
CN111797782A (zh) 基于图像特征的车辆检测方法和系统
US20240135679A1 (en) Method for classifying images and electronic device
Wang et al. Attentional single-shot network with multi-scale feature fusion for object detection in aerial images
CN111914765B (zh) 一种服务区环境舒适度检测方法、设备及可读存储介质
CN116246128B (zh) 跨数据集的检测模型的训练方法、装置及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21771737

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022544673

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021771737

Country of ref document: EP

Effective date: 20220718

ENP Entry into the national phase

Ref document number: 20227026698

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE