WO2023221415A1 - 骨干网络的生成方法、装置、设备以及存储介质 - Google Patents

骨干网络的生成方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2023221415A1
WO2023221415A1 PCT/CN2022/130496 CN2022130496W WO2023221415A1 WO 2023221415 A1 WO2023221415 A1 WO 2023221415A1 CN 2022130496 W CN2022130496 W CN 2022130496W WO 2023221415 A1 WO2023221415 A1 WO 2023221415A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolution kernel
convolution
backbone network
layer
network
Prior art date
Application number
PCT/CN2022/130496
Other languages
English (en)
French (fr)
Inventor
崔程
郜廷权
魏胜禹
董水龙
郭若愚
杜宇宁
赖宝华
刘其文
胡晓光
于佃海
马艳军
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2023221415A1 publication Critical patent/WO2023221415A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present disclosure relates to the technical field of artificial intelligence, specifically to the technical field of deep learning and computer vision, and in particular to methods, devices, equipment, storage media and computer program products for generating backbone networks.
  • the present disclosure provides a method, device, equipment, storage medium and computer program product for generating a backbone network, which can improve the inference speed of the backbone network while taking into account network accuracy and saving GPU hardware resources.
  • a method for generating a backbone network is provided.
  • the backbone network is used for a visual processor, including:
  • the target backbone network is generated.
  • an image processing method including:
  • a device for generating a backbone network is provided.
  • the backbone network is applied to a visual processor, including:
  • the acquisition module is configured to obtain the calculation density of multiple convolution kernels of different sizes
  • the determination module is configured to determine the convolution kernel with the highest calculation density as the first convolution kernel
  • the first generation module is configured to generate the target backbone network based on the first convolution kernel.
  • an image processing device including:
  • a second generation module configured to utilize the backbone network as provided in the first aspect or the second aspect to generate an image processing model for use in the field of computer vision
  • the obtaining module is configured to input the computer vision image to be processed into the image processing model and obtain the image processing result.
  • an electronic device including:
  • a memory communicatively connected to at least one processor; wherein,
  • the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor, so that at least one processor can execute the method provided by the first aspect or the second aspect.
  • a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method provided in the first or second aspect.
  • a computer program product including a computer program that, when executed by a processor, implements the method provided according to the first or second aspect.
  • Figure 1 shows an exemplary system architecture to which the generation method of the backbone network of the present disclosure can be applied
  • Figure 2 shows a flow chart of a first embodiment of a method for generating a backbone network according to the present disclosure
  • Figure 3 shows a flow chart of a second embodiment of a backbone network generation method according to the present disclosure
  • Figure 4 shows an exemplary schematic diagram of a convolutional network generated in an embodiment of the present disclosure
  • Figure 5 shows a flow chart of a third embodiment of a backbone network generation method according to the present disclosure
  • Figure 6 shows a flow chart of a fourth embodiment of a backbone network generation method according to the present disclosure
  • Figure 7 shows a flowchart of an embodiment of an image processing method according to the present disclosure
  • Figure 8 shows a schematic structural diagram of an embodiment of a backbone network generation device according to the present disclosure
  • Figure 9 shows a schematic structural diagram of an embodiment of an image processing device according to the present disclosure.
  • FIG. 10 shows a block diagram of an electronic device used to implement a backbone network generation method or an image processing method according to an embodiment of the present disclosure.
  • backbone networks have made great achievements in academia. However, because its actual operating speed is not directly proportional to indicators such as FLOPs (Floating Point Operations, also called calculation volume) commonly used in academia, it is only a few in the industry. Not many backbone networks are adopted.
  • FLOPs Floating Point Operations, also called calculation volume
  • the present disclosure provides a method for generating a backbone network.
  • the backbone network can be applied to a visual processor GPU, which can improve the inference speed of the backbone network while taking into account network accuracy, saving GPU hardware resources, and saving costs.
  • FIG. 1 shows an exemplary system architecture 100 in which embodiments of the backbone network generation method or the backbone network generation apparatus of the present disclosure can be applied.
  • the system architecture 100 may include a terminal device 101 , a network 102 and a server 103 .
  • the network 102 is used to provide a communication link between the terminal device 101 and the server 103, and may include various connection types, such as wired communication links, wireless communication links, optical fiber cables, etc.
  • the user can use the terminal device 101 to interact with the server 103 through the network 102 to receive or send information, etc.
  • Various client applications can be installed on the terminal device 101.
  • the terminal device 101 may be hardware or software.
  • the terminal device 101 can be various electronic devices, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and so on.
  • the terminal device 101 is software, it can be installed in the above-mentioned electronic device. It can be implemented as multiple software or software modules, or as a single software or software module. There are no specific limitations here.
  • the server 103 may be hardware or software. When the server 103 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or it can be implemented as a single server. When the server 103 is software, it may be implemented as multiple software or software modules (for example, used to provide distributed services), or it may be implemented as a single software or software module. There are no specific limitations here.
  • the backbone network generation method provided by the embodiments of the present disclosure is generally executed by the server 103.
  • the backbone network generation device is generally provided in the server 103.
  • terminal devices 101, networks 102 and servers 103 in Figure 1 are only illustrative. Depending on implementation needs, there may be any number of terminal devices 101, networks 102, and servers 103.
  • FIG. 2 shows a process 200 of an embodiment of a backbone network generation method according to the present disclosure.
  • the backbone network generation method includes the following steps:
  • Step 201 Obtain the calculation densities of multiple convolution kernels of different sizes.
  • the execution body of the backbone network generation method such as the server 103 shown in FIG. 1 , obtains the calculation densities of multiple convolution kernels of different sizes.
  • the calculation density of the convolution kernel can be used to guide the calculation density of the generated backbone network.
  • the execution body of the backbone network generation method obtains the calculation density of multiple commonly used convolution kernels, for example, Conv (Convolution, convolution) 5 ⁇ 5, Conv3 ⁇ 3, Conv1 ⁇ 1, DW3 ⁇ 3, DW5
  • Conv Convolution, convolution
  • Conv1 ⁇ 1 ⁇ 1 DW3 ⁇ 3, DW5 The calculation density of ⁇ 5 equal convolution kernel.
  • Step 202 Determine the convolution kernel with the highest calculation density as the first convolution kernel.
  • the above-mentioned execution subject determines that the convolution kernel with the highest calculation density among multiple convolution kernels of different sizes is the first convolution kernel. That is to say, the calculation densities of multiple convolution kernels of different sizes are compared, and the convolution kernel with the highest calculation density is determined as the first convolution kernel.
  • the calculation density of the Conv5 ⁇ 5 convolution kernel obtained by the above execution subject is 546
  • the calculation density of the Conv3 ⁇ 3 convolution kernel is 598
  • the calculation density of the Conv1 ⁇ 1 convolution kernel is 373
  • the calculation density of the DW3 ⁇ 3 convolution kernel is 373.
  • the calculation density of the accumulation kernel is 26, and the calculation density of the DW5 ⁇ 5 convolution kernel is 58. Based on this, it can be determined that the Conv ⁇ 3 convolution kernel is the first convolution kernel.
  • Step 203 Generate a target backbone network based on the first convolution kernel.
  • the above execution subject generates the target backbone network based on the first convolution kernel determined in step 202.
  • This disclosed embodiment generates a target backbone network based on the convolution kernel with the highest computational density among multiple convolution kernels of different sizes, which greatly improves the computational density of the backbone network, allowing the backbone network to have more floating points at the same inference speed. Operands can have better data fitting capabilities.
  • the method for generating a backbone network provided by embodiments of the present disclosure first obtains the calculation density of multiple convolution kernels of different sizes; then determines the convolution kernel with the highest calculation density among them as the first convolution kernel; and then based on the first convolution kernel Generate the target backbone network.
  • the disclosed backbone network production method generates a target backbone network based on the convolution kernel with the highest computational density.
  • the generated target backbone network has greater computational density and more floating-point operations at the same inference speed, which greatly improves the target Data fitting capabilities of the backbone network.
  • FIG. 3 shows a process 210 of the second embodiment of the backbone network generation method of the present disclosure.
  • the generation method of the backbone network includes the following steps:
  • Step 211 Obtain the calculation densities of multiple convolution kernels of different sizes.
  • the execution body of the backbone network generation method such as the server 103 shown in FIG. 1 , obtains the calculation densities of multiple convolution kernels of different sizes.
  • Step 211 is basically the same as step 201 in the previous embodiment.
  • Step 212 Determine the convolution kernel with the highest calculation density as the first convolution kernel.
  • the execution subject determines that the convolution kernel with the highest calculation density among multiple convolution kernels of different sizes is the first convolution kernel.
  • Step 212 is basically the same as step 202 in the previous embodiment.
  • step 202 please refer to the previous description of step 202, which will not be described again here.
  • Step 213 Generate a convolution layer based on the first convolution kernel.
  • the above execution body generates a convolution layer based on the first convolution kernel determined in step 212.
  • the convolutional layer is used in the backbone network to extract different features of the input for convolution operations. Therefore, improving the computing density, reasoning speed and data fitting capabilities of the convolutional layer can improve the computing density, reasoning speed and data fitting capabilities of the backbone network.
  • a convolution layer may include a first convolution kernel.
  • the above execution entity generates a convolution layer based on the first convolution kernel with the highest computational density, which can greatly increase the computational density of the convolution layer and provide more floating point operations at the same inference speed, thus improving the convolution layer’s reasoning speed and data fitting capabilities, thereby improving the reasoning speed and data fitting capabilities of the backbone network.
  • the process of generating the convolution layer may include: stacking and fusing multiple first convolution kernels to generate a convolution network including multiple convolution layers.
  • the convolutional network generated after stacking and fusing multiple first convolution kernels can be used to expand the receptive field in the backbone network, so that the backbone network can capture a larger receptive field.
  • a larger receptive field can be captured by stacking multiple first convolution kernels with the highest computational density.
  • corresponding features are extracted for feature splicing and fusion, which not only obtains a larger receptive field, but also realizes the integration of information from different receptive fields, greatly improving the data fitting of the network. It also provides better feature integration capabilities for tasks such as target detection (such as image processing) that require different receptive fields.
  • the first convolution kernel may be a standard convolution kernel, for example, the first convolution kernel may be a Conv3 ⁇ 3 convolution kernel.
  • FIG 4 shows a schematic diagram of the convolution network 300 generated by the first convolution kernel in the embodiment of the present disclosure.
  • the first convolution kernel 301 is a Conv3 ⁇ 3 convolution kernel.
  • Four Conv3 ⁇ 3 convolution kernels are used for stacking and fusion (Contact).
  • four convolutional Conv3 ⁇ 3 convolution kernels are stacked and convolved, and the receptive fields of 3 ⁇ 3, 5 ⁇ 5, 7 ⁇ 7, and 9 ⁇ 9 are combined at the same time to form a multi-convolution layer.
  • Convolutional network is shown in this embodiment, in this embodiment, the first convolution kernel 301 is a Conv3 ⁇ 3 convolution kernel.
  • Four Conv3 ⁇ 3 convolution kernels are used for stacking and fusion (Contact).
  • four convolutional Conv3 ⁇ 3 convolution kernels are stacked and convolved, and the receptive fields of 3 ⁇ 3, 5 ⁇ 5, 7 ⁇ 7, and 9 ⁇ 9 are combined at the same time to form a multi-convolution layer.
  • the number of channels of the Conv3 ⁇ 3 convolution kernel is 32
  • the number of channels after stacking and fusing four Conv3 ⁇ 3 convolution kernels is 128, and then the dimensionality is reduced to 32 through a Conv1 ⁇ 1 convolution kernel. While improving the capture receptive field, the amount of parameters and calculations will not be increased, which not only improves the data fitting ability of the convolutional network, but also ensures the inference speed.
  • Step 214 Construct a downsampling layer.
  • the downsampling layer includes a second convolution kernel.
  • the type of the second convolution kernel is different from the type of the first convolution kernel.
  • the downsampling layer can effectively reduce the amount of calculation.
  • downsampling layers either use two-dimensional pooling Pool2D operations or use standard convolutions. Among them, there are no parameters that can be learned in the Pool2D operation, causing the downsampling part to reduce the data fitting ability; while standard convolution will enhance the data fitting ability, but it will bring a large amount of parameters and calculations, which is not conducive to inference deployment. .
  • the second convolution kernel of the down-sampling layer uses a depth convolution kernel (DW convolution kernel).
  • DW convolution kernel depth convolution kernel
  • it can have learnable parameters to increase the data fitting ability of the down-sampling layer.
  • it does not It will increase the amount of parameters and calculations, which can improve the reasoning speed of the backbone network.
  • the second convolution kernel is a DW convolution kernel with a stride of 2.
  • the activation function since the number of parameters of the DW convolution kernel is small, the activation function does not need to be added in the downsampling layer to avoid reducing the data fitting ability of the backbone network.
  • Step 215 Generate the target backbone network based on the convolutional layer and the downsampling layer.
  • the above execution subject generates the target backbone network based on the convolution layer generated in step 213 and the downsampling layer constructed in step 214.
  • the convolutional network includes multiple stages.
  • step 215 is to generate the target backbone network based on the convolutional network and the downsampling layer, which may include: setting a downsampling layer between each two adjacent stages in the convolutional network to obtain the target backbone network.
  • the stages of the convolutional network can be divided according to factors such as network function, convolution kernel size, and receptive field size.
  • the method for generating the backbone network uses the first convolution kernel with the highest computational density to generate the convolution layer, which can capture a larger receptive field and realize information integration of different receptive fields; a DW with a stride of 2 is used
  • the convolution kernel builds a downsampling layer to improve data fitting capabilities and inference speed.
  • the target backbone network generated by the embodiments of the present disclosure can not only capture a larger receptive field, but also have better data fitting capabilities and inference speed.
  • FIG. 5 shows a process 220 of the third embodiment of the backbone network generation method of the present disclosure.
  • the generation method of the backbone network includes the following steps:
  • Step 221 Obtain the floating point operands and delays of multiple convolution kernels of different sizes in the convolutional network of the basic network model.
  • the above execution subject obtains the basic network model, uses the basic network model and its information as the information basis; then obtains multiple convolution kernels of different sizes in the basic network model, and obtains the multiple different sizes.
  • the floating point operands and delays of the convolution kernel are the basic network model and its information as the information basis.
  • the basic network model can be the currently applied CNN (Convolutional Neural Network, convolutional neural network) network model.
  • the basic network model obtained by the execution subject may be one or multiple.
  • Multiple convolution kernels of different sizes may be the most commonly used convolution kernels of multiple different sizes in the convolutional network of the one or more basic network models.
  • the multiple convolution kernels of different sizes may include Conv5 ⁇ 5, Conv3 ⁇ 3, Conv1 ⁇ 1, DW3 ⁇ 3, DW5 ⁇ 5 and other convolution kernels.
  • the floating point operation number FLOPs and latency of each convolution kernel obtained by the above execution subject are shown in Table 1. It should be noted that the data in Table 1 is the result of the multi-layer combination of each convolution kernel itself.
  • Table 1 Floating point operations and delays of multiple convolution kernels of different sizes
  • Convolution kernel Floating point budget number (M) Delay(ms) Calculate density Conv5 ⁇ 5 161061 294.73 546 Conv3 ⁇ 3 57982 97.03 598 Conv1 ⁇ 1 6442 17.29 373 DW3 ⁇ 3 113 4.36 26 DW5 ⁇ 5 314 5.43 58
  • the floating-point operation number FLOPs and delays of multiple convolution kernels of different sizes are obtained and used as a basis for calculating the calculation density of each convolution kernel.
  • Step 222 Determine the calculation density of multiple convolution kernels of different sizes based on floating point operations and delays.
  • the above-mentioned execution subject determines the calculation density of each convolution kernel based on the floating point operation number and delay of the convolution kernel obtained in step 221. For example, the above execution subject calculates the ratio of the floating point operation number and the delay of each corresponding convolution kernel, and uses the ratio as the calculation density of the corresponding convolution kernel.
  • the above execution subject calculated that the calculation density of the Conv5 ⁇ 5 convolution kernel is 546, the calculation density of the Conv3 ⁇ 3 convolution kernel is 598, and the calculation density of the Conv1 ⁇ 1 convolution kernel is 546.
  • the calculation density of the accumulation kernel is 373, the calculation density of the DW3 ⁇ 3 convolution kernel is 26, and the calculation density of the DW5 ⁇ 5 convolution kernel is 58.
  • Step 223 Determine the convolution kernel with the highest calculation density as the first convolution kernel.
  • the above-mentioned execution subject determines the convolution kernel with the highest calculation density among multiple convolution kernels of different sizes as the first convolution kernel based on the determination result of step 222.
  • the Conv3 ⁇ 3 convolution kernel is determined to be the first convolution kernel.
  • Step 223 is basically the same as step 202 in the previous embodiment.
  • Step 223 please refer to the previous description of step 202, which will not be described again here.
  • Step 224 Generate a convolutional network based on the first convolution kernel.
  • the above execution body generates a convolution layer based on the first convolution kernel determined in step 223.
  • Step 224 is basically consistent with step 213 in the previous embodiment. For specific implementation, please refer to the previous description of step 213, which will not be described again here.
  • the average number of channels of the convolution kernel in the convolutional network is smaller than the last output channel number of each stage of the convolutional network in the basic network model, so as to reduce the amount of parameters of the convolutional network and The amount of calculation is reduced, thereby reducing the amount of parameters and calculation amount of the generated backbone network, and improving the inference speed.
  • Step 225 Construct a downsampling layer.
  • the downsampling layer includes a second convolution kernel.
  • the type of the second convolution kernel is different from the type of the first convolution kernel.
  • the above-mentioned execution subject constructs a downsampling layer through the second convolution kernel to further reduce the amount of calculation and improve the data fitting ability.
  • Step 225 is basically the same as step 214 in the previous embodiment.
  • Step 226 Generate the target backbone network based on the convolutional network and downsampling layer.
  • the above execution body generates the target backbone network based on the convolution layer generated in step 224 and the downsampling layer constructed in step 225.
  • Step 226 is basically consistent with step 215 in the previous embodiment. For specific implementation, please refer to the previous description of step 215, which will not be described again here.
  • the floating point operations and delays of multiple convolution kernels of different sizes are obtained, and based on this, the convolution kernels of multiple different sizes are determined.
  • Figure 6 shows a process 230 of the fourth embodiment of the backbone network generation method of the present disclosure.
  • the generation method of the backbone network includes the following steps:
  • Step 231 Obtain the calculation densities of multiple convolution kernels of different sizes.
  • the execution body of the backbone network generation method such as the server 103 shown in FIG. 1 , obtains the calculation densities of multiple convolution kernels of different sizes.
  • Step 231 is basically consistent with step 201 or steps 221-222 in the previous embodiment.
  • Step 201 or steps 221-222 please refer to the previous description of step 201 or steps 221-222, which will not be described again here.
  • Step 232 Determine the convolution kernel with the highest calculation density as the first convolution kernel.
  • the above-mentioned execution subject determines that the convolution kernel with the highest calculation density among multiple convolution kernels of different sizes is the first convolution kernel.
  • Step 232 is basically consistent with step 202 or step 223 in the previous embodiment.
  • Step 232 is basically consistent with step 202 or step 223 in the previous embodiment.
  • Step 233 Generate a convolutional network based on the first convolution kernel.
  • Step 233 is basically consistent with step 213 or step 224 in the previous embodiment.
  • Step 234 Construct a downsampling layer.
  • the downsampling layer includes a second convolution kernel.
  • the type of the second convolution kernel is different from the type of the first convolution kernel.
  • the above-mentioned execution subject constructs a downsampling layer through the second convolution kernel to further reduce the amount of calculation and improve the data fitting ability.
  • Step 234 is basically the same as step 214 in the previous embodiment.
  • Step 234 please refer to the previous description of step 214, which will not be described again here.
  • Step 235 Construct a global pooling layer, a fully connected layer and a classification layer in sequence after the convolutional network.
  • the above execution body constructs the GAP (Global Average Pooling) global pooling layer, FC (Full Connection) fully connected layer and classification layer in sequence.
  • GAP Global Average Pooling
  • FC Full Connection
  • the global pooling layer is used to perform overall mean pooling of the feature data of the convolutional network to further reduce the amount of parameters.
  • the fully connected layer is used to integrate the highly abstract features that have gone through multiple convolutions before, and then normalize them to output a probability for various classification situations, so that subsequent classification layers can obtain the results based on the fully connected layer. Probability classification.
  • the global pooling layer is directly connected to the classification layer.
  • adding a fully connected layer between the global pooling layer and the classification layer can bring very few FLOPs and will not affect the inference speed, but can greatly improve the final accuracy of the backbone network.
  • Step 236 Generate a target backbone network based on the convolutional network, downsampling layer, global pooling layer, fully connected layer and classification layer.
  • the above execution subject generates the final target backbone network based on the convolution layer, downsampling layer, global pooling layer, fully connected layer and classification layer generated and constructed sequentially in steps 233-235.
  • the output of the convolutional network including multiple convolutional layers is used as the input of the global pooling layer, and the output of the global pooling layer is used as the input of the fully connected layer.
  • the output of the fully connected layer is used as the input of the classification layer
  • the output of the classification layer is used as the output of the backbone network.
  • the backbone network generation method based on the current basic network model, floating point operations and delays of multiple convolution kernels of different sizes are obtained, and based on this, multiple convolution kernels of different sizes are determined.
  • the generated backbone network can be used for Build machine learning models in computer vision. For example, it is used to build machine learning models in the field of object detection.
  • FIG. 7 shows a process 400 of an embodiment of an image processing method according to the present disclosure.
  • the image processing method includes the following steps:
  • Step 401 Use the backbone network to generate an image processing model for the field of computer vision.
  • the execution subject of the image processing method can use the backbone network to generate an image processing model for the field of computer vision.
  • the backbone network used by the execution subject may be a backbone network generated according to the backbone network generation method described above in this disclosure.
  • the backbone network may include a convolution layer generated by a first convolution kernel with the highest computational density, a downsampling layer based on a second convolution kernel of a different type from the first convolution kernel, a global pooling layer, and a fully connected layer. layer and classification layer, in which multiple convolutional layers generate a convolutional network, and the downsampling layer is set between two adjacent stages of the convolutional network.
  • Step 402 Input the computer vision image to be processed into the image processing model to obtain the image processing result.
  • the above-mentioned execution subject directly inputs the computer vision image to be processed into the image processing model.
  • the image processing model extracts image features based on the backbone network, processes them, and outputs them to obtain the image processing results.
  • the computer vision image to be processed can be selected and uploaded by the user from existing images, or it can be taken by the user through the camera of the terminal device, and the computer vision image to be processed can contain any person or thing. image, which is not specifically limited in this embodiment.
  • the backbone network serves as the basic feature extractor for the target detection task.
  • the main task of target detection is to take an image as input and output the feature map of the corresponding input image.
  • the backbone network After the computer vision image to be processed is input to the image processing model, the backbone network performs image segmentation on the input computer vision image to obtain some original areas, then extracts the image features in the original areas, and then classifies the extracted features. , and finally the detected target object is obtained.
  • the image processing method provided by the embodiment of the disclosure generates an image processing model for the field of computer vision based on the backbone network generated by the backbone network generation method provided by the disclosure; and then inputs the computer vision image to be processed into the image processing model , the image processing result can be obtained.
  • the image processing method of this embodiment uses the backbone network generated by the aforementioned method to extract and process image features, which improves the speed and accuracy of extracting and processing image features, thereby improving image processing efficiency and processing effects.
  • Figure 8 shows an embodiment of a device for generating a backbone network according to the present disclosure.
  • This device embodiment corresponds to the method embodiment shown in Figure 2.
  • the device is specifically Can be used in various electronic devices.
  • the backbone network generation device 500 includes: an acquisition module 501 , a determination module 502 and a first generation module 503 .
  • the obtaining module 501 is configured to obtain the calculation density of multiple convolution kernels of different sizes
  • the determination module 502 is configured to determine the convolution kernel with the largest calculation density as the first convolution kernel
  • the first generation module 503 is The configuration is to generate the target backbone network based on the first convolution kernel.
  • the specific processing of the acquisition module 501, the determination module 502 and the first generation module 503 and the technical effects they bring can be referred to the steps in the corresponding embodiment of Figure 2 respectively.
  • the relevant instructions for 201-203 will not be repeated here.
  • the first generation module includes: a first generation sub-module, a first construction sub-module and a second generation sub-module.
  • the first generation sub-module is configured to generate a convolution layer based on the first convolution kernel
  • the first construction sub-module is configured to construct a down-sampling layer, and the down-sampling layer includes a second convolution kernel, a second volume
  • the type of the convolution kernel is different from the type of the first convolution kernel
  • the second generation sub-module is configured to generate the target backbone network based on the convolution layer and the downsampling layer.
  • the first convolution kernel is a 3 ⁇ 3 standard convolution kernel
  • the second convolution kernel is a depth convolution kernel with a stride of 2.
  • the first generation sub-module is configured to stack and fuse multiple first convolution kernels to generate a convolutional network including multiple convolutional layers.
  • the convolutional network includes multiple stages, and the second generation sub-module is configured to set a downsampling layer between each two adjacent stages in the convolutional network, Obtain the target backbone network.
  • the acquisition module includes: an acquisition sub-module and a first determination sub-module.
  • the acquisition sub-module is configured to obtain the floating-point operands and delays of multiple convolution kernels of different sizes in the convolutional network of the basic network model;
  • the first determination sub-module is configured to obtain the floating-point operands and delays according to the floating-point operands and Delay, determine the computational density of multiple convolution kernels of different sizes.
  • the average number of channels of the convolution kernel in the convolutional network is less than the last output channel number of the last stage of the convolutional network in the basic network model.
  • the first generation module further includes a second building sub-module.
  • the second building sub-module is configured to sequentially build a global pooling layer, a fully connected layer and a classification layer after the convolutional network.
  • step 235 in the corresponding embodiment of FIG. 6, which will not be described again here.
  • the backbone network is used to build a machine learning model in the field of computer vision.
  • Figure 9 shows an embodiment of an image processing device provided according to the present disclosure.
  • the device embodiment corresponds to the method embodiment shown in Figure 7.
  • the device is specifically Can be used in various electronic devices.
  • the image processing device 600 includes: a second generating module 601 and a obtaining module 602 .
  • the second generation module 601 is configured to generate an image processing model for the field of computer vision using the backbone network provided by the first aspect or the second aspect;
  • the obtaining module 602 is configured to generate the computer vision image to be processed. Input the image processing model and get the image processing results.
  • the present disclosure also provides an electronic device, a non-transitory computer-readable storage medium storing computer instructions, and a computer program product.
  • the electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that at least one The processor is capable of executing the above method for generating a backbone network.
  • a non-transitory computer-readable storage medium storing computer instructions is used to cause the computer to execute the above method for generating a backbone network.
  • a computer program product includes a computer program, and when executed by a processor, the computer program implements the above method for generating a backbone network.
  • FIG. 10 shows a schematic block diagram of an example electronic device 700 that may be used to implement embodiments of the present disclosure.
  • Electronic devices are intended to refer to various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device 700 includes a computing unit 701 that can execute according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 Various appropriate actions and treatments. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored.
  • Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704.
  • An input/output (I/O) interface 705 is also connected to bus 704.
  • the I/O interface 705 includes: an input unit 706, such as a keyboard, a mouse, etc.; an output unit 707, such as various types of displays, speakers, etc.; a storage unit 708, such as a magnetic disk, optical disk, etc. ; and communication unit 709, such as a network card, modem, wireless communication transceiver, etc.
  • the communication unit 709 allows the device 700 to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
  • Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 701 performs various methods and processes described above, such as a backbone network generation method or an image processing method.
  • the backbone network generation method or the image processing method may be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as the storage unit 708.
  • part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709.
  • the computer program When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the generation method of the backbone network or the image processing method described above may be performed.
  • the computing unit 701 may be configured to perform the backbone network generation method or the image processing method in any other suitable manner (eg, by means of firmware).
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or a combination thereof.
  • These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor
  • the processor which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • Computer systems may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact over a communications network.
  • the relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, a distributed system server, or a server combined with a blockchain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Complex Calculations (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种骨干网络的生成方法、装置、设备以及存储介质,涉及人工智能技术领域,尤其涉及深度学习和计算机视觉技术领域。骨干网络应用于视觉处理器,该骨干网络的生成方法包括:获取多个不同尺寸的卷积核的计算密度;确定计算密度最大的卷积核为第一卷积核;基于第一卷积核,生成目标骨干网络。本公开提供的骨干网络的生成方法,使得骨干网络能够获得更大的感受野,在视觉处理器上可以兼顾较快的推理速度和较高的精度,节约成本。

Description

骨干网络的生成方法、装置、设备以及存储介质
本专利申请要求于2022年5月18日提交的、申请号为202210551186.6、发明名称为“骨干网络的生成方法、装置、设备以及存储介质”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
技术领域
本公开涉及人工智能技术领域,具体为深度学习和计算机视觉技术领域,尤其涉及骨干网络的生成方法、装置、设备、存储介质以及计算机程序产品。
背景技术
在基于深度学习的计算机视觉领域,几乎所有的任务都需要骨干网络来完成特征的提取。但是目前的骨干网络对视觉处理器GPU(Graphics Processing Unit)设备的利用率不够高,推理速度与精度难以兼顾。
发明内容
本公开提供了一种骨干网络的生成方法、装置、设备、存储介质以及计算机程序产品,可以提高骨干网络的推理速度,同时兼顾网络精度,节约GPU硬件资源。
根据本公开的第一方面,提供了一种骨干网络的生成方法,骨干网络用于视觉处理器,包括:
获取多个不同尺寸的卷积核的计算密度;
确定计算密度最大的卷积核为第一卷积核;
基于第一卷积核,生成目标骨干网络。
根据本公开的第二方面,提供了一种图像处理方法,包括:
利用第一方面提供的骨干网络,生成用于计算机视觉领域的图像处理模型;
将待处理的计算机视觉图像输入图像处理模型,得到图像处理结果。
根据本公开的第三方面,提供了一种骨干网络的生成装置,骨干网络应用于视觉处理器,包括:
获取模块,被配置为获取多个不同尺寸的卷积核的计算密度;
确定模块,被配置为确定计算密度最大的卷积核为第一卷积核;
第一生成模块,被配置为基于第一卷积核,生成目标骨干网络。
根据本公开的第四方面,提供了一种图像处理装置,包括:
第二生成模块,被配置为利用如第一方面或第二方面提供的骨干网络,生成用于计算机视觉领域的图像处理模型;
得到模块,被配置为将待处理的计算机视觉图像输入图像处理模型,得到图像处理结果。
根据本公开的第五方面,提供了一种电子设备,包括:
至少一个处理器;以及
与至少一个处理器通信连接的存储器;其中,
存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行第一方面或第二方面提供的方法。
根据本公开的第六方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,计算机指令用于使计算机执行如第一方面或第二方面提供的方法。
根据本公开的第七方面,提供了一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现根据第一方面或第二方面提供的方法。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1示出了可以应用本公开的骨干网络的生成方法的示例性系统架构;
图2示出了根据本公开的骨干网络的生成方法的第一实施例的流程 图;
图3示出了根据本公开的骨干网络的生成方法的第二实施例的流程图;
图4示出了本公开实施例中生成的卷积网络的一种示例性示意图;
图5示出了根据本公开的骨干网络的生成方法的第三实施例的流程图;
图6示出了根据本公开的骨干网络的生成方法的第四实施例的流程图;
图7示出了根据本公开的图像处理方法的一种实施例的流程图;
图8示出了根据本公开的骨干网络的生成装置的一种实施例的结构示意图;
图9示出了根据本公开的图像处理装置的一种实施例的结构示意图;
图10示出了用来实现本公开实施例的骨干网络的生成方法或图像处理方法的电子设备的框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。
骨干网络发展至今,在学术界大放异彩,但是由于其实际运行速度与学术界常用的FLOPs(Floating Point Operations,浮点运算数,也叫计算量)等指标不成正比,所以在工业界只有为数不多的骨干网络被采用。
而被工业应用的骨干网络中,部分由于采用了大量1×1卷积,导致骨干网络的推理速度被拖慢;还有一部分由于采用大量的DW(Depthwise,深度)卷积,导致骨干网络底层优化困难,精度低。
本公开提供了一种骨干网络的生成方法,该骨干网络可应用于视觉处 理器GPU,可以在提高骨干网络的推理速度的同时,兼顾网络精度,节约GPU硬件资源,节约成本。
图1示出了可以应用本公开的骨干网络的生成方法或骨干网络的生成装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括终端设备101、网络102和服务器103。其中,网络102用以在终端设备101和服务器103之间提供通信链路,可以包括各种连接类型,例如,有线通信链路、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101通过网络102与服务器103交互,以接收或发送信息等。终端设备101上可以安装有各种客户端应用。
终端设备101可以是硬件,也可以是软件。当终端设备101为硬件时,可以是各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101为软件时,可以安装在上述电子设备中。其可以实现成多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
服务器103可以是硬件,也可以是软件。当服务器103为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器103为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。
本公开实施例所提供的骨干网络的生成方法一般由服务器103执行,相应地,骨干网络的生成装置一般设置于服务器103中。
需要说明的是,图1中的终端设备101、网络102和服务器103的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备101、网络102和服务器103。
图2示出了根据本公开的骨干网络的生成方法的一种实施例的流程200,该骨干网络的生成方法包括以下步骤:
步骤201,获取多个不同尺寸的卷积核的计算密度。
在本实施例中,骨干网络的生成方法的执行主体,例如图1所示的服务器103,获取多个不同尺寸的卷积核的计算密度。
其中,卷积核的计算密度可以用于指导所生成的骨干网络的计算密度, 卷积核的计算密度越大,骨干网络的计算密度越大,在相同推理速度下有更多的浮点运算数,可以有更好的数据拟合能力。
示例性地,骨干网络的生成方法的执行主体获取多个常用的卷积核的计算密度,例如,Conv(Convolution,卷积)5×5、Conv3×3、Conv1×1、DW3×3、DW5×5等卷积核的计算密度。
步骤202,确定计算密度最大的卷积核为第一卷积核。
在本实施例中,根据步骤201的获取结果,上述执行主体确定多个不同尺寸的卷积核中计算密度最大的卷积核为第一卷积核。也就是说,将多个不同尺寸的卷积核的计算密度进行对比,将计算密度最大的卷积核确定为第一卷积核。
示例性地,上述执行主体获取到的Conv5×5卷积核的计算密度为546、Conv3×3卷积核的计算密度为598、Conv1×1卷积核的计算密度为373、DW3×3卷积核的计算密度为26、DW5×5卷积核的计算密度为58,据此,可确定Conv×3卷积核为第一卷积核。
步骤203,基于第一卷积核,生成目标骨干网络。
在本实施例中,上述执行主体根据步骤202中确定的第一卷积核,生成目标骨干网络。
本公开实施例基于多个不同尺寸的卷积核中计算密度最大的卷积核,生成目标骨干网络,大大提升骨干网络的计算密度,使得的骨干网络在相同推理速度下有更多的浮点运算数,可以有更好的数据拟合能力。
本公开实施例提供的骨干网络的生成方法,先获取多个不同尺寸的卷积核的计算密度;然后确定其中计算密度最大的卷积核作为第一卷积核;再基于第一卷积核生成目标骨干网络。本公开的骨干网络的生产方法,基于计算密度最大的卷积核生成目标骨干网络,所生成的目标骨干网络具有更大的计算密度和相同推理速度下更多的浮点运算数,大大提升目标骨干网络的数据拟合能力。
图3示出了本公开的骨干网络的生成方法的第二实施例的流程210。参照图3所示,该骨干网络的生成方法包括以下步骤:
步骤211、获取多个不同尺寸的卷积核的计算密度。
在本实施例中,骨干网络的生成方法的执行主体,例如图1所示的服务器103,获取多个不同尺寸的卷积核的计算密度。
步骤211与前述实施例的步骤201基本一致,具体实现方式可以参考前述对步骤201的描述,此处不再赘述。
步骤212,确定计算密度最大的卷积核为第一卷积核。
在本实施例中,根据步骤211的获取结果,上述执行主体确定多个不同尺寸的卷积核中计算密度最大的卷积核为第一卷积核。
步骤212与前述实施例的步骤202基本一致,具体实现方式可以参考前述对步骤202的描述,此处不再赘述。
步骤213,基于第一卷积核,生成卷积层。
在本实施例中,上述执行主体基于步骤212中确定的第一卷积核生成卷积层。其中,卷积层在骨干网络中用于提取输入的不同特征进行卷积运算。因此,提升卷积层的计算密度、推理速度和数据拟合能力,即可提升骨干网络的计算密度、推理速度和数据拟合能力。
示例性地,一个卷积层中可以包括一个第一卷积核。
在本实施例中,上述执行主体基于计算密度最大的第一卷积核生成卷积层,可大幅提升卷积层的计算密度、相同推理速度下更多的浮点运算数,从而提升卷积层的推理速度和数据拟合能力,进而提升骨干网络的推理速度和数据拟合能力。
在一些示例性实施例中,基于第一卷积核,生成卷积层的过程可以包括:将多个第一卷积核进行堆叠与融合,生成包括多个卷积层的卷积网络。
在本实施例中,多个第一卷积核堆叠与融合之后生成的卷积网络可以用于在骨干网络中扩大感受野,使得骨干网络能够捕获更大的感受野。
在相关技术中,由于骨干网络中卷积层的卷积核较小,不能捕获图片的全部感受野。
而在本公开实施例中,使用多个计算密度最大的第一卷积核进行堆叠,可以捕获更大的感受野。同时,在堆叠的第一卷积核中,分别引出相应的特征做特征的拼接和融合,不仅可以获得更大的感受野,而且实现了不同感受野的信息整合,大大提升网络的数据拟合能力,也为目标检测(例如图像处理)等需要不同感受野的任务提供了更好的特征整合能力。
在一些示例性实施例中,第一卷积核可以为标准卷积核,例如,第一卷积核可以为Conv3×3卷积核。
图4示出了本公开实施例中第一卷积核生成的卷积网络300的示意图,参照图4所示,在本实施例中,第一卷积核301为Conv3×3卷积核,采用四个Conv3×3卷积核进行堆叠和融合(Contact)。如图4所示,四个卷积Conv3×3卷积核进行堆叠卷积,同时融合3×3、5×5、7×7、9×9的感受野,形成包括多个卷积层的卷积网络。
示例性地,若Conv3×3卷积核的通道数为32,则四个Conv3×3卷积核堆叠并融合之后的通道数为128,再通过一个Conv1×1卷积核降维至32,在提升捕获感受野的同时,不会增加参数量和计算量,既提升了卷积网络的数据拟合能力,又保证了推理速度。
步骤214,构建下采样层,下采样层包括第二卷积核,第二卷积核的类型与第一卷积核的类型不同。
在骨干网络中,下采样层可以有效减少计算量。在相关技术中,应用于视觉处理器的骨干网络中,下采样层要么使用二维池化Pool2D操作,要么使用标准卷积。其中,Pool2D操作中没有可以学习的参数,导致下采样部分降低了数据拟合能力;而标准卷积虽然会增强数据拟合能力,但是会带来大量的参数量和计算量,不利于推理部署。
本公开实施例中,下采样层的第二卷积核采用深度卷积核(DW卷积核),一方面可以有可学习的参数,增加下采样层的数据拟合能力,另一方面不会增加参数量和计算量,可以提升骨干网络的推理速度。
示例性地,第二卷积核为步长Stride为2的DW卷积核。
在本实施例中,由于DW卷积核参数量较少,因此,在下采样层中可以不添加激活函数,以避免降低骨干网络的数据拟合能力。
步骤215,基于卷积层和下采样层,生成目标骨干网络。
在本实施例中,上述执行主体基于步骤213中生成的卷积层和步骤214中构建的下采样层,生成目标骨干网络。
在本公开实施例的一些可选的实现方式中,卷积网络包括多个阶段stage。在本实施例中,步骤215,基于卷积网络和下采样层,生成目标骨干网络,可以包括:在卷积网络中每相邻的两个阶段之间设置下采样层, 得到目标骨干网络。
其中,卷积网络的阶段stage可以根据网络功能、卷积核的尺寸、感受野的大小等因素进行划分。
本实施例提供的骨干网络的生成方法,采用计算密度最大的第一卷积核生成卷积层,可以捕获更大的感受野,实现不同感受野的信息整合;采用步长Stride为2的DW卷积核构建下采样层,提高数据拟合能力和推理速度。本公开实施例所生成的目标骨干网络,不仅能够捕获更大的感受野,而且具有更佳的数据拟合能力和推理速度。
图5示出了本公开的骨干网络的生成方法的第三实施例的流程220。参照图5所示,该骨干网络的生成方法包括以下步骤:
步骤221,获取基础网络模型的卷积网络中多个不同尺寸的卷积核的浮点运算数和延时。
在本实施例中,上述执行主体获取基础网络模型,以该基础网络模型及其信息作为信息基础;然后获取该基础网络模型中的多个不同尺寸的卷积核,并获取该多个不同尺寸的卷积核的浮点运算数和延时。
其中,基础网络模型可以为当前在应用的CNN(Convolutional Neural Network,卷积神经网络)网络模型。上述执行主体获取的基础网络模型可以为一个,也可以为多个。
多个不同尺寸的卷积核可以为该一个或多个基础网络模型的卷积网络中最常用多个不同尺寸的卷积核。例如,该多个不同尺寸的卷积核可以包括Conv5×5、Conv3×3、Conv1×1、DW3×3、DW5×5等卷积核。上述执行主体获取的上述各卷积核的浮点运算数FLOPs和延时Latency如表1所示。需要说明的是,表1中的数据是每个卷积核自身多层组合后的结果。
表1 多个不同尺寸的卷积核的浮点运算数和延时
卷积核 浮点预算数(M) 延时(ms) 计算密度
Conv5×5 161061 294.73 546
Conv3×3 57982 97.03 598
Conv1×1 6442 17.29 373
DW3×3 113 4.36 26
DW5×5 314 5.43 58
在本公开实施例中,获取多个不同尺寸的卷积核的浮点运算数FLOPs和延时,用以作为计算各卷积核的计算密度的依据。
步骤222,根据浮点运算数和延时,确定多个不同尺寸的卷积核的计算密度。
在本实施例中,上述执行主体根据步骤221中获取的卷积核的浮点运算数和延时,确定各卷积核的计算密度。示例性地,上述执行主体通过计算每个对应卷积核的浮点运算数与延时的比值,将该比值作为对应卷积核的计算密度。
如上表1所示,在该实施例中,上述执行主体根据该计算方式,计算得到Conv5×5卷积核的计算密度为546、Conv3×3卷积核的计算密度为598、Conv1×1卷积核的计算密度为373、DW3×3卷积核的计算密度为26、DW5×5卷积核的计算密度为58。
步骤223,确定计算密度最大的卷积核为第一卷积核。
在本实施例中,上述执行主体根据步骤222的确定结果,确定多个不同尺寸的卷积核中计算密度最大的卷积核为第一卷积核。例如,表1所示实施例中,确定Conv3×3卷积核为第一卷积核。
步骤223与前述实施例的步骤202基本一致,具体实现方式可以参考前述对步骤202的描述,此处不再赘述。
步骤224,基于第一卷积核,生成卷积网络。
在本实施例中,上述执行主体基于步骤223中确定的第一卷积核生成卷积层。步骤224与前述实施例的步骤213基本一致,具体实现方式可以参考前述对步骤213的描述,此处不再赘述。
需要指出的是,在本公开实施例中,卷积网络中卷积核的平均通道数小于基础网络模型中卷积网络的每一个阶段最后的输出通道数,以减少卷积网络的参数量和计算量,从而减少所生成的骨干网络的参数量和计算量,提高推理速度。
步骤225,构建下采样层,下采样层包括第二卷积核,第二卷积核的类型与第一卷积核的类型不同。
在本实施例中,上述执行主体通过第二卷积核构建下采样层,以进一步减少计算量,同时提高数据拟合能力。
步骤225与前述实施例的步骤214基本一致,具体实现方式可以参考前述对步骤214的描述,此处不再赘述。
步骤226,基于卷积网络和下采样层,生成目标骨干网络。
在本实施例中,上述执行主体基于步骤224中生成的卷积层和步骤225中构建的下采样层,生成目标骨干网络。
步骤226与前述实施例的步骤215基本一致,具体实现方式可以参考前述对步骤215的描述,此处不再赘述。
在本公开实施例提供的骨干网络的生成方法中,基于当前的基础网络模型,获取多个不同尺寸的卷积核的浮点运算数和延时,并据以确定多个不同尺寸的卷积核的计算密度;并根据确定结果确定计算密度最大的卷积核作为第一卷积核,以生成卷积层,扩大卷积层捕获的感受野,提升数据拟合能力;再利用与第一卷积核不同类型的第二卷积核构建下采样层,增加下采样层的数据拟合能力,提升骨干网络的推理速度。
图6示出了本公开的骨干网络的生成方法的第四实施例的流程230。参照图6所示,该骨干网络的生成方法包括以下步骤:
步骤231,获取多个不同尺寸的卷积核的计算密度。
在本实施例中,骨干网络的生成方法的执行主体,例如图1所示的服务器103,获取多个不同尺寸的卷积核的计算密度。
步骤231与前述实施例的步骤201或步骤221-222基本一致,具体实现方式可以参考前述对步骤201或步骤221-222的描述,此处不再赘述。
步骤232,确定计算密度最大的卷积核为第一卷积核。
在本实施例中,根据步骤231的获取结果,上述执行主体确定多个不同尺寸的卷积核中计算密度最大的卷积核为第一卷积核。
步骤232与前述实施例的步骤202或步骤223基本一致,具体实现方式可以参考前述对步骤202或步骤223的描述,此处不再赘述。
步骤233,基于第一卷积核,生成卷积网络。
在本实施例中,上述执行主体基于步骤232中确定的第一卷积核生成卷积网络。步骤233与前述实施例的步骤213或步骤224基本一致,具体实现方式可以参考前述对步骤213或步骤224的描述,此处不再赘述。
步骤234,构建下采样层,下采样层包括第二卷积核,第二卷积核的 类型与第一卷积核的类型不同。
在本实施例中,上述执行主体通过第二卷积核构建下采样层,以进一步减少计算量,同时提高数据拟合能力。
步骤234与前述实施例的步骤214基本一致,具体实现方式可以参考前述对步骤214的描述,此处不再赘述。
步骤235,在卷积网络之后依次构建全局池化层、全连接层和分类层。
在本实施例中,上述执行主体在构建卷积网络之后,依次构建GAP(Global Average Pooling)全局池化层、FC(Full Connection)全连接层和分类层。
其中,全局池化层用于将卷积网络的特征数据进行整体的均值池化,进一步减少参数量。全连接层用于将前面经过多次卷积后高度抽象化的特征进行整合,然后可以进行归一化,对各种分类情况都输出一个概率,便于之后的分类层可以根据全连接层得到的概率进行分类。
在相关技术中,全局池化层与分类层直接连接。而在本公开实施例中,在全局池化层与分类层之间增加全连接层,能够带来的FLOPs非常少,不会影响推理速度,但是可以大幅度提升骨干网络最终的精度。
步骤236,基于卷积网络、下采样层、全局池化层、全连接层和分类层,生成目标骨干网络。
在本实施例中,上述执行主体基于步骤233-235中依次生成和构建的卷积层、下采样层、全局池化层、全连接层和分类层,生成最终的目标骨干网络。例如,下采样层分布与卷积层的相邻两个阶段之间,包括多个卷积层的卷积网络的输出作为全局池化层的输入,全局池化层的输出作为全连接层的输入,全连接层的输出作为分类层的输入,分类层的输出作为骨干网络的输出。
本公开实施例提供的骨干网络的生成方法中,基于当前的基础网络模型,获取多个不同尺寸的卷积核的浮点运算数和延时,并据以确定多个不同尺寸的卷积核的计算密度;并根据确定结果确定计算密度最大的卷积核作为第一卷积核,以生成卷积层,扩大卷积网络捕获的感受野,提升数据拟合能力;然后在卷积网络的相邻两个阶段之间,利用与第一卷积核不同类型的第二卷积核构建下采样层,增加下采样层的数据拟合能力,提升推 理速度;然后通过在全局池化层与分类层之间构建全连接层,大幅度提升骨干网络最终的精度。
需要指出的是,图2-图6所示任一实施例以及在本公开中未能示出但可以据本公开的骨干网络的生成方法进行实施的实施例中,所生成的骨干网络可用于构建计算机视觉领域的机器学习模型。例如,用于构建目标检测领域的机器学习模型。
图7示出了根据本公开的图像处理方法的一个实施例的流程400,参照图7所示,该图像处理方法包括以下步骤:
步骤401,利用骨干网络,生成用于计算机视觉领域的图像处理模型。
在本实施例中,图像处理方法的执行主体(例如图1所示的服务器103)可以利用骨干网络,生成用于计算机视觉领域的图像处理模型。
其中,上述执行主体所利用的骨干网络,可以是根据本公开前述骨干网络的生成方法所生成的骨干网络。例如,该骨干网络可以包括计算密度最大的第一卷积核生成的卷积层、基于与第一卷积核类型不同的第二卷积核构建的下采样层、全局池化层、全连接层以及分类层,其中,多个卷积层生成卷积网络,下采样层设置于卷积网络的相邻两个阶段之间。
步骤402,将待处理的计算机视觉图像输入图像处理模型,得到图像处理结果。
在本实施例中,上述执行主体将待处理的计算机视觉图像直接输入图像处理模型,图像处理模型基于骨干网络提取图像特征,并进行处理、输出,即可得到图像处理结果。
其中,待处理的计算机视觉图像可以为用户从现有的图像中选取并上传的,也可以为用户通过终端设备的摄像头拍摄的,并且该待处理的计算机视觉图像中可以包含任何人物或事物的图像,本实施例中对此不作具体限定。
骨干网络作为目标检测任务的基本特征提取器,目标检测的主要任务是将图像作为输入并输出相应输入图像的特征图。待处理的计算机视觉图像输入图像处理模型后,骨干网络对输入的计算机视觉图像进行图像分割,得到一些原始区域,然后对原始区域内的图像特征进行提取,然后对提取到的特征进行分类等处理,最终得到检测出的目标对象。
本公开实施例提供的图像处理方法,基于本公开提供的骨干网络的生成方法所生成的骨干网络,生成用于计算机视觉领域的图像处理模型;然后将待处理的计算机视觉图像输入该图像处理模型,即可得到图像处理结果。本实施例的图像处理方法采用前述方法生成的骨干网络对图像的特征进行提取、处理,提升了提取和处理图像特征的速度和精度,进而提升图像处理效率和处理效果。
作为对上述各图所示方法的实现,图8示出了根据本公开的骨干网络的生成装置的一种实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
参照图8所示,该骨干网络的生成装置500包括:获取模块501、确定模块502和第一生成模块503。其中,获取模块501被配置为,获取多个不同尺寸的卷积核的计算密度;确定模块502被配置为,确定计算密度最大的卷积核为第一卷积核;第一生成模块503被配置为,基于第一卷积核,生成目标骨干网络。
在本实施例中,骨干网络的生成装置500中,获取模块501、确定模块502和第一生成模块503的具体处理及其所带来的技术效果,可分别参考图2对应实施例中的步骤201-203的相关说明,在此不再赘述。
在本公开实施例的一些可选的实现方式中,第一生成模块包括:第一生成子模块、第一构建子模块和第二生成子模块。其中,第一生成子模块被配置为,基于第一卷积核,生成卷积层;第一构建子模块被配置为,构建下采样层,下采样层包括第二卷积核,第二卷积核的类型与第一卷积核的类型不同;第二生成子模块被配置为,根据卷积层和下采样层,生成目标骨干网络。
在本实施例中,第一生成子模块、第一构建子模块和第二生成子模块的具体处理及其所带来的技术效果,可分别参考图3对应实施例中的步骤213-215的相关说明,在此不再赘述。
在本公开实施例的一些可选的实现方式中,第一卷积核为3×3标准卷积核,第二卷积核为步长为2的深度卷积核。
在本公开实施例的一些可选的实现方式中,第一生成子模块被配置为,将多个第一卷积核进行堆叠与融合,生成包括多个卷积层的卷积网络。
在本公开实施例的一些可选的实现方式中,卷积网络包括多个阶段,第二生成子模块被配置为,在卷积网络中每相邻的两个阶段之间设置下采样层,得到目标骨干网络。
在本公开实施例的一些可选的实现方式中,获取模块包括:获取子模块和第一确定子模块。其中,获取子模块被配置为,获取基础网络模型的卷积网络中多个不同尺寸的卷积核的浮点运算数和延时;第一确定子模块被配置为,根据浮点运算数和延时,确定多个不同尺寸的卷积核的计算密度。
在本实施例中,获取子模块和第一确定子模块的具体处理及其所带来的技术效果,可分别参考图5对应实施例中的步骤221-222的相关说明,在此不再赘述。
在本公开实施例的一些可选的实现方式中,卷积网络中卷积核的平均通道数小于基础网络模型中卷积网络的最后一个阶段最后的输出通道数。
在本公开实施例的一些可选的实现方式中,第一生成模块还包括第二构建子模块。其中,第二构建子模块被配置为,在卷积网络之后依次构建全局池化层、全连接层和分类层。
在本实施例中,第二构建子模块的具体处理及其所带来的技术效果,可分别参考图6对应实施例中的步骤235的相关说明,在此不再赘述。
在本公开实施例提供的骨干网络的生成装置中,骨干网络用于构建计算机视觉领域的机器学习模型。
作为对上述各图所示方法的实现,图9示出了根据本公开提供的一种图像处理装置的一个实施例,该装置实施例与图7所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
参照图9所示,该图像处理装置600包括:第二生成模块601和得到模块602。其中,第二生成模块601被配置为,利用如第一方面或第二方面提供的骨干网络,生成用于计算机视觉领域的图像处理模型;得到模块602被配置为,将待处理的计算机视觉图像输入图像处理模型,得到图像处理结果。
在本实施例中,图像处理装置600中,第二生成模块601和得到模块 602的具体处理及其所带来的技术效果可分别参考图7对应实施例中的步骤401-402的相关说明,在此不再赘述。
根据本公开的实施例,本公开还提供了一种电子设备、一种存储有计算机指令的非瞬时计算机可读存储介质和一种计算机程序产品。
其中,该电子设备包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述骨干网络的生成方法。
在一些实施例中,一种存储有计算机指令的非瞬时计算机可读存储介质中,计算机指令用于使计算机执行上述骨干网络的生成方法。
在一些实施例中,一种计算机程序产品包括计算机程序,计算机程序在被处理器执行时实现上述骨干网络的生成方法。
图10示出了可以用来实施本公开的实施例的示例电子设备700的示意性框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图10所示,设备700包括计算单元701,其可以根据存储在只读存储器(ROM)702中的计算机程序或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序,来执行各种适当的动作和处理。在RAM 703中,还可存储设备700操作所需的各种程序和数据。计算单元701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。
设备700中的多个部件连接至I/O接口705,包括:输入单元706,例如键盘、鼠标等;输出单元707,例如各种类型的显示器、扬声器等;存储单元708,例如磁盘、光盘等;以及通信单元709,例如网卡、调制解调器、无线通信收发机等。通信单元709允许设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。
计算单元701可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元701的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元701执行上文所描述的各个方法和处理,例如骨干网络的生成方法或图像处理方法。例如,在一些实施例中,骨干网络的生成方法或图像处理方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元708。在一些实施例中,计算机程序的部分或者全部可以经由ROM 702和/或通信单元709而被载入和/或安装到设备700上。当计算机程序加载到RAM 703并由计算单元701执行时,可以执行上文描述的骨干网络的生成方法或图像处理方法的一个或多个步骤。备选地,在其他实施例中,计算单元701可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行骨干网络的生成方法或图像处理方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的 服务器。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (23)

  1. 一种骨干网络的生成方法,所述骨干网络应用于视觉处理器,包括:
    获取多个不同尺寸的卷积核的计算密度;
    确定计算密度最大的卷积核为第一卷积核;
    基于所述第一卷积核,生成目标骨干网络。
  2. 根据权利要求1所述的生成方法,其中,所述基于所述第一卷积核,生成目标骨干网络,包括:
    基于所述第一卷积核,生成卷积层;
    构建下采样层,所述下采样层包括第二卷积核,所述第二卷积核的类型与所述第一卷积核的类型不同;
    根据所述卷积层和所述下采样层,生成目标骨干网络。
  3. 根据权利要求2所述的生成方法,其中,所述基于所述第一卷积核,生成卷积层,包括:
    将多个所述第一卷积核进行堆叠与融合,生成包括多个所述卷积层的卷积网络。
  4. 根据权利要求3所述的生成方法,其中,所述卷积网络包括多个阶段;
    所述根据所述卷积层和所述下采样层,生成目标骨干网络,包括:
    在所述多个阶段的每相邻的两个阶段之间设置所述下采样层,得到所述目标骨干网络。
  5. 根据权利要求3所述的生成方法,其中,所述获取多个不同尺寸的卷积核的计算密度,包括:
    获取基础网络模型的卷积网络中多个不同尺寸的卷积核的浮点运算 数和延时;
    根据所述浮点运算数和所述延时,确定所述多个不同尺寸的卷积核的计算密度。
  6. 根据权利要求5所述的生成方法,其中,所述卷积网络中卷积核的平均通道数小于所述基础网络模型中卷积网络的每一个阶段最后的输出通道数。
  7. 根据权利要求3所述的生成方法,其中,所述基于所述第一卷积核,生成目标骨干网络,还包括:
    在所述卷积网络之后依次构建全局池化层、全连接层和分类层。
  8. 根据权利要求1所述的生成方法,其中,所述第一卷积核为3×3标准卷积核,所述第二卷积核为步长为2的深度卷积核。
  9. 根据权利要求1所述的生成方法,其中,所述骨干网络用于构建计算机视觉领域的机器学习模型。
  10. 一种图像处理方法,包括:
    利用权利要求1-9任一项所述的骨干网络,生成用于计算机视觉领域的图像处理模型;
    将待处理的计算机视觉图像输入所述图像处理模型,得到图像处理结果。
  11. 一种骨干网络的生成装置,所述骨干网络应用于视觉处理器,包括:
    获取模块,被配置为获取多个不同尺寸的卷积核的计算密度;
    确定模块,被配置为确定计算密度最大的卷积核为第一卷积核;
    第一生成模块,被配置为基于所述第一卷积核,生成目标骨干网络。
  12. 根据权利要求11所述的生成装置,其中,所述第一生成模块包括:
    第一生成子模块,被配置为基于所述第一卷积核,生成卷积层;
    第一构建子模块,被配置为构建下采样层,所述下采样层包括第二卷积核,所述第二卷积核的类型与所述第一卷积核的类型不同;
    第二生成子模块,被配置为根据所述卷积层和所述下采样层,生成目标骨干网络。
  13. 根据权利要求12所述的生成装置,其中,所述第一生成子模块,被配置为将多个所述第一卷积核进行堆叠与融合,生成包括多个所述卷积层的卷积网络。
  14. 根据权利要求13所述的生成装置,其中,所述卷积网络包括多个阶段;所述第二生成子模块被配置为在所述多个阶段的每相邻的两个阶段之间设置所述下采样层,得到所述目标骨干网络。
  15. 根据权利要求13所述的生成装置,其中,所述获取模块包括:
    获取子模块,被配置为获取基础网络模型的卷积网络中多个不同尺寸的卷积核的浮点运算数和延时;
    第一确定子模块,被配置为根据所述浮点运算数和所述延时,确定所述多个不同尺寸的卷积核的计算密度。
  16. 根据权利要求15所述的生成装置,其中,所述卷积网络中卷积核的平均通道数小于所述基础网络模型中卷积网络的每一个阶段最后的输出通道数。
  17. 根据权利要求13所述的生成装置,其中,所述第一生成模块还包括:
    第二构建子模块,被配置为在所述卷积网络之后依次构建全局池化层、全连接层和分类层。
  18. 根据权利要求11所述的生成装置,其中,所述第一卷积核为3×3标准卷积核,所述第二卷积核为步长为2的深度卷积核。
  19. 根据权利要求11所述的生成装置,其中,所述骨干网络用于构建计算机视觉领域的机器学习模型。
  20. 一种图像处理装置,包括:
    第二生成模块,被配置为利用权利要求1-9任一项所述的骨干网络,生成用于计算机视觉领域的图像处理模型;
    得到模块,被配置为将待处理的计算机视觉图像输入所述图像处理模型,得到图像处理结果。
  21. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-10中任一项所述的方法。
  22. 一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行权利要求1-10中任一项所述的方法。
  23. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-10中任一项所述的方法。
PCT/CN2022/130496 2022-05-18 2022-11-08 骨干网络的生成方法、装置、设备以及存储介质 WO2023221415A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210551168.6A CN114897147B (zh) 2022-05-18 2022-05-18 骨干网络的生成方法、装置、设备以及存储介质
CN202210551168.6 2022-05-18

Publications (1)

Publication Number Publication Date
WO2023221415A1 true WO2023221415A1 (zh) 2023-11-23

Family

ID=82724224

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130496 WO2023221415A1 (zh) 2022-05-18 2022-11-08 骨干网络的生成方法、装置、设备以及存储介质

Country Status (2)

Country Link
CN (1) CN114897147B (zh)
WO (1) WO2023221415A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897147B (zh) * 2022-05-18 2023-06-06 北京百度网讯科技有限公司 骨干网络的生成方法、装置、设备以及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085652A (zh) * 2019-06-14 2020-12-15 深圳市中兴微电子技术有限公司 一种图像处理的方法、装置、计算机存储介质及终端
US20220019843A1 (en) * 2020-07-14 2022-01-20 Flir Unmanned Aerial Systems Ulc Efficient refinement neural network for real-time generic object-detection systems and methods
CN114897147A (zh) * 2022-05-18 2022-08-12 北京百度网讯科技有限公司 骨干网络的生成方法、装置、设备以及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647974A (zh) * 2018-06-27 2020-01-03 杭州海康威视数字技术股份有限公司 深度神经网络中的网络层运算方法及装置
CN110991317B (zh) * 2019-11-29 2023-05-16 中山大学 一种基于多尺度透视感知型网络的人群计数方法
CN113168429A (zh) * 2020-05-11 2021-07-23 深圳市大疆创新科技有限公司 卷积计算装置、方法和计算机存储介质
CN111652903B (zh) * 2020-05-22 2023-09-08 重庆理工大学 一种自动驾驶场景下基于卷积关联网络的行人目标跟踪方法
CN113420824B (zh) * 2021-07-03 2024-06-28 上海理想信息产业(集团)有限公司 针对工业视觉应用的预训练数据筛选及训练方法、系统

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085652A (zh) * 2019-06-14 2020-12-15 深圳市中兴微电子技术有限公司 一种图像处理的方法、装置、计算机存储介质及终端
US20220019843A1 (en) * 2020-07-14 2022-01-20 Flir Unmanned Aerial Systems Ulc Efficient refinement neural network for real-time generic object-detection systems and methods
CN114897147A (zh) * 2022-05-18 2022-08-12 北京百度网讯科技有限公司 骨干网络的生成方法、装置、设备以及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Master's Theses", 1 June 2022, TIANJIN UNIVERSITY, China, article YU, HAONAN: "Research on 3D Face Recognition Based on Data Augmentation", pages: 1 - 70, XP009550371, DOI: 10.27356/d.cnki.gtjdu.2020.001227 *
HAOLIN CHEN, GAO SHANGBING; XIANG LIN; CAI CHUANGXIN; WANG CHANGCHUN: "FIRE-DET:an efficient flame detection model ", JOURNAL OF NANJING UNIVERSITY OF INFORMATION SCIENCE & TECHNOLOGY(NATURAL SCIENCE, vol. 15, no. 1, 20 December 2021 (2021-12-20), pages 76 - 84, XP093108188 *

Also Published As

Publication number Publication date
CN114897147A (zh) 2022-08-12
CN114897147B (zh) 2023-06-06

Similar Documents

Publication Publication Date Title
US20220147822A1 (en) Training method and apparatus for target detection model, device and storage medium
WO2022156561A1 (zh) 一种自然语言处理方法以及装置
US20220415072A1 (en) Image processing method, text recognition method and apparatus
CN112990219B (zh) 用于图像语义分割的方法和装置
JP2022135991A (ja) クロスモーダル検索モデルのトレーニング方法、装置、機器、および記憶媒体
JP2023531350A (ja) サンプル画像を増分する方法、画像検出モデルの訓練方法及び画像検出方法
KR20230139296A (ko) 포인트 클라우드 처리 모델의 훈련과 포인트 클라우드 인스턴스 분할 방법 및 장치
CN114792355B (zh) 虚拟形象生成方法、装置、电子设备和存储介质
CN112528995B (zh) 用于训练目标检测模型的方法、目标检测方法及装置
EP4020387A2 (en) Target tracking method and device, and electronic apparatus
CN115690443B (zh) 特征提取模型训练方法、图像分类方法及相关装置
US20220398834A1 (en) Method and apparatus for transfer learning
WO2023221415A1 (zh) 骨干网络的生成方法、装置、设备以及存储介质
CN115511779B (zh) 图像检测方法、装置、电子设备和存储介质
CN115456167B (zh) 轻量级模型训练方法、图像处理方法、装置及电子设备
US20240135698A1 (en) Image classification method, model training method, device, storage medium, and computer program
EP4095761A1 (en) Method for generating backbone network, apparatus for generating backbone network, device, and storage medium
WO2023015942A1 (zh) 确定图像特征的方法、装置、电子设备和存储介质
WO2023019996A1 (zh) 图像特征的融合方法、装置、电子设备和存储介质
CN114238611B (zh) 用于输出信息的方法、装置、设备以及存储介质
JP2024537258A (ja) 音声ウェイクアップ方法、装置、電子機器、記憶媒体及びコンピュータプログラム
CN116229095A (zh) 一种模型训练方法、视觉任务处理方法、装置及设备
CN116152702A (zh) 点云标签的获取方法、装置、电子设备和自动驾驶车辆
US20220113943A1 (en) Method for multiply-add operations for neural network
CN112559727B (zh) 用于输出信息的方法、装置、设备、存储介质和程序

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22942436

Country of ref document: EP

Kind code of ref document: A1