WO2023173552A1 - 目标检测模型的建立方法、应用方法、设备、装置及介质 - Google Patents

目标检测模型的建立方法、应用方法、设备、装置及介质 Download PDF

Info

Publication number
WO2023173552A1
WO2023173552A1 PCT/CN2022/090664 CN2022090664W WO2023173552A1 WO 2023173552 A1 WO2023173552 A1 WO 2023173552A1 CN 2022090664 W CN2022090664 W CN 2022090664W WO 2023173552 A1 WO2023173552 A1 WO 2023173552A1
Authority
WO
WIPO (PCT)
Prior art keywords
target detection
detection model
network
depth
separable
Prior art date
Application number
PCT/CN2022/090664
Other languages
English (en)
French (fr)
Inventor
郑喜民
贾云舒
周成昊
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023173552A1 publication Critical patent/WO2023173552A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Definitions

  • the present application relates to the fields of computer vision and image recognition in artificial intelligence, and in particular to a method of establishing a target detection model, an application method, equipment, devices and media.
  • Target detection is widely used in artificial intelligence, robot navigation, intelligent video surveillance, industrial inspection, aerospace and many other fields. At the same time, target detection is also a pre-order algorithm required for many visual tasks, which is very important for subsequent face recognition, gait recognition, crowd recognition, etc. Tasks such as counting, instance segmentation play a crucial role. Due to the widespread use of deep learning, target detection algorithms have developed rapidly.
  • the target detection algorithm based on deep learning is mainly divided into (1) two-stage target detection: the first stage first generates candidate areas, including approximate location information of the target, and then the second stage classifies the candidate areas and refines the location; (2) Single-stage target detection: directly generate the category probability and corresponding position coordinate value of the object.
  • embodiments of the present application provide a method for establishing a target detection model, including: obtaining a basic target detection network, replacing the ordinary convolutional layer of the basic target detection network with a depth-separable convolutional layer, and A multi-scale feature fusion mechanism is added to the basic target detection network to obtain an initial target detection model; a preset digital image is obtained, and the preset digital image is input to the initial target detection model; through the initial target detection model
  • the depth-separable convolution layer performs feature extraction on the preset digital image and outputs a feature map; performs target detection on the feature map through the multi-scale feature fusion mechanism of the initial target detection model to obtain an intermediate Target detection model; the NetAdapt algorithm and the pruning algorithm are used to optimize the intermediate target detection model to obtain the final target detection model.
  • embodiments of the present application provide a method for applying a target detection model, including: acquiring an actual digital image, and inputting the actual digital image into the final target detection model; and being separable by the depth of the target detection model.
  • the convolution layer extracts features from the actual digital image and outputs a feature map; the feature map is used for target detection through the multi-scale feature fusion mechanism of the target detection model.
  • embodiments of the present application provide a device for establishing a target detection model, including: a network modification module, used to obtain a basic target detection network, and replace ordinary convolutional layers of the basic target detection network with depth-separable convolutions.
  • a digital image acquisition module is used to obtain a preset digital image, and input the preset digital image into the An initial target detection model;
  • a feature extraction module used to extract features of the preset digital image through the depth-separable convolution layer of the initial target detection model, and output a feature map;
  • a target detection module used to use the depth-separable convolution layer of the initial target detection model to The multi-scale feature fusion mechanism of the initial target detection model performs target detection on the feature map to obtain an intermediate target detection model;
  • a model optimization module is used to optimize the intermediate target detection model using the NetAdapt algorithm and the pruning algorithm. Process to obtain the final target detection model.
  • embodiments of the present application provide a target detection device, including: a digital image acquisition module, used to acquire an actual digital image, and input the actual digital image into the target detection model; a feature extraction module, used to The depth-separable convolution layer of the target detection model performs feature extraction on the actual digital image and outputs a feature map; the target detection module is used to perform feature extraction on the feature map through the multi-scale feature fusion mechanism of the target detection model.
  • Target Detection including: a digital image acquisition module, used to acquire an actual digital image, and input the actual digital image into the target detection model; a feature extraction module, used to The depth-separable convolution layer of the target detection model performs feature extraction on the actual digital image and outputs a feature map; the target detection module is used to perform feature extraction on the feature map through the multi-scale feature fusion mechanism of the target detection model.
  • embodiments of the present application provide a target detection device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • a A method of establishing a target detection model and/or a method of applying a target detection model wherein the method of establishing the target detection model includes: obtaining a basic target detection network, and replacing the ordinary convolutional layer of the basic target detection network It is a depth-separable convolution layer, and a multi-scale feature fusion mechanism is added to the basic target detection network to obtain an initial target detection model; a preset digital image is obtained, and the preset digital image is input to the initial target Detection model; perform feature extraction on the preset digital image through the depth-separable convolution layer of the initial target detection model, and output a feature map; perform feature extraction on the preset digital image through the multi-scale feature fusion mechanism of the initial target detection model
  • the feature map is used for target detection to obtain an intermediate target detection
  • embodiments of the present application provide a computer-readable storage medium storing computer-executable instructions, the computer-executable instructions being used to execute a method for establishing a target detection model and/or a target detection model.
  • the establishment method of the target detection model includes: obtaining a basic target detection network, replacing the ordinary convolutional layer of the basic target detection network with a depth-separable convolutional layer, and integrating the multi-scale feature fusion mechanism Add to the basic target detection network to obtain an initial target detection model; obtain a preset digital image, and input the preset digital image to the initial target detection model; and obtain the initial target detection model through the depth of the initial target detection model.
  • the separate convolution layer performs feature extraction on the preset digital image and outputs a feature map; performs target detection on the feature map through the multi-scale feature fusion mechanism of the initial target detection model to obtain an intermediate target detection model; using The NetAdapt algorithm and the pruning algorithm optimize the intermediate target detection model to obtain the final target detection model; wherein the application method of the target detection model includes: obtaining an actual digital image and inputting the actual digital image to the target Detection model; perform feature extraction on the actual digital image through the depth-separable convolution layer of the target detection model, and output a feature map; perform target detection on the feature map through the multi-scale feature fusion mechanism of the target detection model .
  • the establishment method, application method, equipment, device and medium of the target detection model proposed by the embodiment of this application by obtaining the basic target detection network, replace the ordinary convolution layer of the basic target detection network with a depth-separable convolution layer, and A multi-scale feature fusion mechanism is added to the basic target detection network to obtain an initial target detection model; a preset digital image is obtained, and the preset digital image is input to the initial target detection model; the convolution layer pair is separable through the depth of the initial target detection model Preset digital images for feature extraction and output feature maps; perform target detection on the feature maps through the multi-scale feature fusion mechanism of the initial target detection model to obtain an intermediate target detection model; use the NetAdapt algorithm and pruning algorithm to optimize the intermediate target detection model Process to obtain the final target detection model.
  • This application replaces the ordinary convolution layer in the basic target detection network with a depth-separable convolution layer, and adds a multi-scale feature fusion mechanism to obtain an initial target detection model.
  • the initial target detection model uses depth-separable convolution to extract features. Compared with using ordinary convolution to extract features, the number of parameters of depth-separable convolution is smaller, thereby reducing the data processing burden of the processor in the embedded device. Similarly, when the number of parameters is the same, depth-separable convolution is used. The number of neural network layers can be made deeper. Adding a multi-scale feature fusion mechanism can enable the target detection model to learn deep features and shallow features at the same time during target detection, which will express the features better and enhance the target detection accuracy.
  • the NetAdapt algorithm and the pruning algorithm are used to optimize the intermediate target detection model to obtain the final target detection model.
  • the NetAdapt algorithm and the pruning algorithm miniaturize the intermediate target detection model to obtain the final target detection model, achieving the purpose of accelerating inference and final target detection.
  • the model can run on different embedded devices, and the overall detection speed will also be faster, improving the effect of target detection on embedded devices.
  • Figure 1 is a schematic diagram of a system architecture platform for establishing a target detection model and applying the target detection model provided by an embodiment of the present application;
  • Figure 2 is a flow chart of a method for establishing a target detection model provided by an embodiment of the present application
  • Figure 3 is a flow chart of the feature extraction method provided by the embodiment of the present application.
  • Figure 4 is a flow chart of a method for target detection using a multi-scale feature fusion mechanism provided by an embodiment of the present application
  • Figure 5 is a flow chart of the NetAdapt algorithm optimization process provided by the embodiment of the present application.
  • Figure 6 is a flow chart of the pruning algorithm optimization process provided by the embodiment of the present application.
  • Figure 7 is a flow chart of the application method of the target detection model provided by the embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a target detection model establishing device provided by an embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a target detection device provided by an embodiment of the present application.
  • Artificial Intelligence It is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science, artificial intelligence Intelligence attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a manner similar to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Natural Language Processing uses computers to process, understand and use human languages (such as Chinese, English, etc.). NLP is a branch of artificial intelligence and an interdisciplinary subject of computer science and linguistics. It's called computational linguistics. Natural language processing includes syntax analysis, semantic analysis, text understanding, etc. Natural language processing is often used in technical fields such as machine translation, handwritten and printed character recognition, speech recognition and text-to-speech conversion, information retrieval, information extraction and filtering, text classification and clustering, public opinion analysis and opinion mining, etc. It involves language processing Related data mining, machine learning, knowledge acquisition, knowledge engineering, artificial intelligence research and linguistic research related to language computing, etc.
  • NER Information Extraction
  • Text processing technology that extracts specified types of factual information such as entities, relationships, events, etc. from natural language text and forms structured data output.
  • Information extraction is a technique for extracting specific information from text data.
  • Text data is composed of some specific units, such as sentences, paragraphs, and chapters.
  • Text information is composed of some small specific units, such as words, words, phrases, sentences, paragraphs, or a combination of these specific units.
  • Extracting noun phrases, person names, place names, etc. from text data is text information extraction.
  • the information extracted by text information extraction technology can be various types of information.
  • target detection is a popular direction in the field of computer vision and digital image processing. It is widely used in artificial intelligence, robot navigation, intelligent video surveillance, industrial inspection, aerospace and many other fields. At the same time, target detection is also required for many visual tasks. A pre-order algorithm plays a vital role in subsequent tasks such as face recognition, gait recognition, crowd counting, and instance segmentation. Due to the widespread use of deep learning, target detection algorithms have developed rapidly.
  • the target detection algorithm based on deep learning is mainly divided into (1) two-stage target detection: the first stage first generates candidate areas, including approximate location information of the target, and then the second stage classifies the candidate areas and refines the location; (2) Single-stage target detection: directly generate the category probability and corresponding position coordinate value of the object.
  • the existing single-stage algorithm does not need to generate candidate areas, and the overall process is simpler and faster, but the accuracy is not high enough; while the two-stage algorithm is not fast enough to ensure accuracy;
  • the number of network parameters of existing target detection algorithms is relatively large. In practical applications, it relies on large computers such as servers to achieve the effect of real-time detection of targets. However, it is difficult to achieve this effect when the network structure is transplanted to embedded devices such as mobile phones. , because the processor performance of embedded devices is far inferior to that of servers.
  • embodiments of the present application provide a target detection model establishment method, application method, equipment, device and medium, which can effectively improve the target detection efficiency of embedded devices.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, mechatronics and other technologies.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometric technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the establishment method and application method of the target detection model provided by the embodiments of this application relate to the fields of artificial intelligence and digital medical technology.
  • the establishment method and application method of the target detection model provided by the embodiments of the present application can be applied to the terminal or the server side, or can be software running in the terminal or the server side.
  • the terminal can be a smartphone, a tablet, a laptop, a desktop computer, etc.
  • the server can be configured as an independent physical server, or as a server cluster or distributed system composed of multiple physical servers.
  • a cloud that can be configured to provide basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • Server software can be an application that implements a target detection model, etc., but is not limited to the above forms.
  • the application may be used in a variety of general or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, including Distributed computing environment for any of the above systems or devices, etc.
  • the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • Figure 1 is a schematic diagram of a system architecture platform for establishing a target detection model and applying the target detection model provided by an embodiment of the present application.
  • the system architecture platform 100 in this embodiment of the present application includes one or more processors 110 and a memory 120.
  • processors 110 and a memory 120 are taken as an example.
  • the processor 110 and the memory 120 may be connected through a bus or other means.
  • the connection through a bus is taken as an example.
  • the memory 120 can be used to store non-transitory software programs and non-transitory computer executable programs.
  • the memory 120 may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • the memory 120 may optionally include a memory 120 located remotely relative to the processor 110 , and these remote memories may be connected to the system architecture platform 100 through a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the device structure shown in Figure 1 does not constitute a limitation on the system architecture platform 100, and may include more or less components than shown, or combine certain components, or arrange different components. .
  • Figure 2 is a flow chart of a method for establishing a target detection model provided by an embodiment of the present application.
  • the method of establishing a target detection model according to the embodiment of the present application includes but is not limited to step S200, step S210, step S220, Step S230, step S240 and step S250.
  • Step S200 obtain the basic target detection network
  • Step S210 replace the ordinary convolutional layer of the basic target detection network with a depth-separable convolutional layer, and add a multi-scale feature fusion mechanism to the basic target detection network to obtain an initial target detection model;
  • Step S220 obtain a preset digital image and input the preset digital image into the initial target detection model
  • Step S230 perform feature extraction on the preset digital image through the depth-separable convolution layer of the initial target detection model, and output a feature map
  • Step S240 Perform target detection on the feature map through the multi-scale feature fusion mechanism of the initial target detection model to obtain an intermediate target detection model
  • Step S250 Use the NetAdapt algorithm and the pruning algorithm to optimize the intermediate target detection model to obtain the final target detection model.
  • a basic target detection network is obtained, and the backbone structure of the basic target detection network is replaced with a lightweight neural network.
  • a lightweight neural network refers to a neural network that requires a smaller number of parameters and a lower computational cost. Model. Due to the small computational overhead of micro-neural networks, micro-neural network models can be deployed on devices with limited computing resources, such as smartphones, tablets, or other embedded devices.
  • the depth-separable convolution is an independent convolution for each channel, and then the feature map is weighted and combined in the depth direction through point-by-point convolution. , generate a new feature map.
  • the number of parameters is less than that of ordinary convolution, which effectively reduces the data processing burden of the embedded device processor. If the number of parameters is the same, the number of neural network layers using depth-separable convolution can be made deeper, greatly improving the target Detection feature extraction effect.
  • Adding the multi-scale feature fusion mechanism to the basic target detection network can enable the target detection model to learn both deep features and shallow features at the same time when detecting targets.
  • the expression effect of the features will be better, and the target detection accuracy will be enhanced, with better performance. Good feature expression effect.
  • the NetAdapt Platinum-Aware Neural Network Adaptation for Mobile Applications
  • the NetAdapt algorithm and pruning algorithm make the intermediate target detection model
  • the target detection model is miniaturized to obtain the final target detection model to achieve the purpose of accelerating inference.
  • the final target detection model can run on different embedded devices. Similarly, the overall detection speed will be faster, improving the effect of target detection on embedded devices. .
  • the NetAdapt algorithm optimizes the convolution kernel of the depth-separable convolution layer of the intermediate target detection model; the pruning algorithm optimizes the network structure of the intermediate target detection model.
  • the above model training is a conventional model training process.
  • Figure 3 is a flow chart of the feature extraction method provided by the embodiment of the present application.
  • the feature extraction method of the embodiment of the present application includes but is not limited to step S300, step S310 and step S320.
  • Step S300 Use point-by-point convolution to perform channel dimension upscaling processing on the preset digital image
  • Step S310 use depth convolution to perform feature extraction processing on the preset digital image after the channel dimension is increased, and obtain multiple initial feature maps
  • Step S320 Use point-by-point convolution to perform channel dimensionality reduction processing on multiple initial feature maps, and output the final feature map.
  • the features of the preset digital image are extracted through the depthwise separable convolution layer, and the final feature map is output.
  • the depthwise separable convolution (Depthwise Separable Convolution) mainly consists of the depthwise convolution (Depthwise Convolution) and the step-by-step feature map. Composed of pointwise convolution. Compared with ordinary convolution, depth-separable convolution can perform fewer convolution steps but achieve the same feature extraction effect, with fewer parameters. When the number of parameters is the same, , can be done deeper than ordinary convolution, greatly improving the feature extraction effect of target detection.
  • the depth-separable convolution layer After the depth-separable convolution layer obtains the preset digital image, it uses point-by-point convolution to perform channel dimensionality processing on the preset digital image. Because depth convolution has no ability to change the number of channels due to its own computing characteristics, the above A layer can only output as many channels as it gives it. Therefore, if the number of channels given by the previous layer is very small, depth convolution can only extract features in low-dimensional space, so the effect is not good enough.
  • point-by-point convolution is performed before depth convolution is used to extract features, which is used to perform image dimension upscaling processing, and the channel dimension upscaling coefficient T is defined, so that no matter whether the number of input channels is more or less, after point-by-point convolution After the channel dimension upscaling process, depth convolution efficiently performs feature extraction processing in a relatively higher dimensional space.
  • Depth convolution is used to perform feature processing on the preset digital image after the channel dimension is increased. Since the convolution kernel of depth convolution corresponds to the input channel one-to-one, one convolution kernel is responsible for one input channel, and one input channel is only convolved by one Kernel convolution, so the number of initial feature maps produced is the same as the number of convolution kernels. Point-by-point convolution is performed before depth convolution is used for feature extraction processing to perform channel dimension upgrading processing. Using depth convolution for feature extraction processing will result in Multiple initial feature maps.
  • the features extracted by deep convolution are combined to output the final feature map.
  • Dimensionality reduction can well maintain network performance and make the network more lightweight. , while lower-dimensional features contain all necessary information.
  • Figure 4 is a flow chart of a method for target detection using a multi-scale feature fusion mechanism provided by an embodiment of the present application.
  • the target detection method provided by an embodiment of the present application includes but is not limited to step S400, step S410, step S420 and step S430.
  • Step S400 obtain the first final feature map output by the first depth-separable convolution layer and the height and width of the first final feature map;
  • Step S410 obtain the second final feature map output by the second depth-separable convolution layer and adjust the height and width of the second final feature map so that the height and width of the second final feature map are the same as the height and width of the first final feature map. same;
  • Step S420 perform channel splicing and convolution on the adjusted second final feature map and the first final feature map to obtain fusion features
  • Step S430 Perform target detection based on fusion features.
  • a multi-scale feature fusion mechanism is used for target detection.
  • the receptive field of the high-level feature network is relatively large and the semantic information representation ability is strong.
  • the resolution of the feature network is low and the representation ability of the features is weak (the spatial geometric feature details are lacking. ); the receptive field of the low-level feature network is relatively small, and the feature information representation ability is strong.
  • the resolution is high, the semantic information representation ability is weak.
  • the semantic information of high-level feature networks can accurately detect or segment targets. Therefore, in target detection, all these features are added together for detection and segmentation to improve the target detection effect.
  • the small-scale feature network has a large receptive field and is suitable for detecting objects with large targets.
  • the large-scale feature network has a smaller receptive field and is therefore suitable for detecting small targets.
  • the second final feature map output by the second depth-separable convolution layer By obtaining the first final feature map output by the first depth-separable convolution layer and the height and width of the first final feature map, obtain the second final feature map output by the second depth-separable convolution layer and adjust the second final feature
  • the height and width of the map are such that the height and width of the second final feature map are the same as the height and width of the first final feature map.
  • the adjusted second final feature map and the first final feature map are merged by channel splicing and convolution.
  • target detection is performed based on the fusion features, and the feature map output by the back layer of the network is adjusted to the feature map output by the front layer of the network for feature splicing, so that deep features and shallow features are detected at the same time, improving the expression effect of the features and improving the target detection model. Detection capabilities for targets of different sizes.
  • each feature grid detects 4 bounding boxes. If the center point of the object falls within the feature grid, only the IOU (Intersection Over Union) with the real border is selected. , intersection-union ratio), the bounding box with the largest overlap is detected, and other bounding boxes with smaller IOU values are discarded. This can improve the model's detection ability for targets of different sizes and improve the generalization ability of the bounding box of the feature grid.
  • IOU Intersection Over Union
  • Figure 5 is a flow chart of the NetAdapt algorithm optimization process provided by the embodiment of the present application.
  • the NetAdapt algorithm optimization process provided by the embodiment of the present application includes but is not limited to step S500 and step S510.
  • Step S500 Optimize the convolution kernel of a layer of original depth-separable convolutional network to obtain multiple second depth-separable convolutional networks;
  • Step S510 Compare the delay and accuracy of a second depth separable convolution network with the original depth separable convolution network corresponding to the second depth separable convolution network, and select the final depth separable convolution network based on the comparison results. .
  • the embodiment of this application uses a network compression method, the NetAdapt algorithm, which deploys the optimized network to the device to directly obtain the actual performance indicators, and then based on this actually obtained Performance indicators guide new network compression strategies, so that network compression is performed in such an iterative manner to obtain the final result.
  • NetAdapt network optimization is performed in an automated manner to gradually reduce the resource consumption of the pre-trained network while maximizing accuracy. The optimization loop runs until the resource budget is met.
  • NetAdapt can generate not only one network that meets the budget, but also a series of simplified networks with different trade-offs, enabling dynamic network selection and further research.
  • the NetAdapt algorithm is used to search for the number of convolution kernels in each depth-separable convolutional network layer. And optimize the number of convolution kernels in each depth-separable convolution network layer. The ultimate goal is to find a network with high accuracy and small delay in the second depth-separable convolution network set that conforms to delay attenuation as the final depth. Separable convolutional networks. Maintain accuracy while optimizing object detection model latency and reduce the size of bottlenecks in augmentation layers and depth-separable convolutional network layers per layer.
  • the NetAdapt algorithm optimizes the convolution kernel of a layer of original depth-separable convolutional networks, and obtains multiple second-depth separable convolutional networks as a set of second-depth separable convolutional networks. From the second-depth separable convolutional networks Select a second depth separable convolution network from the set, and compare the latency and accuracy of the second depth separable convolution network with the original depth separable convolution network corresponding to the second depth separable convolution network, And select the final depth-separable convolution network based on the comparison results, when the delay of the second depth-separable convolution network is greater than the delay of the original depth-separable convolution network and/or the accuracy of the second depth-separable convolution network Lower than the original depth separable convolution network, select the original depth separable convolution network as the final depth separable convolution network; when the delay of the second depth separable convolution network is less than the delay of the original depth separable convolution network
  • Figure 6 is a flow chart of the pruning algorithm optimization process provided by the embodiment of the present application.
  • the pruning algorithm optimization process provided by the embodiment of the present application includes but is not limited to step S600 and step S610.
  • Step S600 pruning the network structure of the intermediate target detection model to remove redundant weight parameters of the network structure
  • Step S610 Fine-tune the intermediate target detection model after pruning.
  • the intermediate target detection model is trained by the initial target detection model. It has a large number of redundant weight parameters and neurons that are useless for target detection, resulting in the overall model being too bloated.
  • the pruning algorithm is used to modify the intermediate target.
  • the network structure of the detection model is pruned to remove redundant weight parameters and useless neurons in the network structure to achieve a more compact target detection model.
  • Pruning the network structure of the intermediate target detection model to remove redundant weight parameters of the network structure includes but is not limited to the following steps: first, encode the network structure with the number of channels in each layer of the network structure after pruning, and convert it into a Group coding vectors, in order to search for an optimal pruning network, constantly try various coding vectors, re-enter the pruning network, and the pruned network weights will be generated; and then verify based on the network structure, network weights and presets The performance of the intermediate target monitoring model after pruning is obtained. Then an evolutionary algorithm is used to search for the optimal coding vector as the final coding vector, and the final target detection model is obtained based on the final coding vector. When searching for the final encoding vector using an evolutionary algorithm, a custom objective function is used.
  • the objective function includes but is not limited to the accuracy function, delay function and calculation amount function of the network.
  • an evolutionary algorithm is used to search for the optimal encoding vector as the final encoding vector, and the final target detection model is obtained based on the final encoding vector.
  • Specific operations include but are not limited to treating the encoding vector as a vector representation of the number of channels in each layer of the network.
  • the number of channels in each layer can correspond to genes in the evolutionary algorithm.
  • a large number of genes are randomly selected, and by calculating the accuracy of the network weights generated by the pruning network on the preset verification set, the top K genes with the highest accuracy are taken out, and then new genes are generated using crossover and mutation methods. Mutation is to randomly change the proportion of elements in a gene.
  • Crossover is to randomly recombine the genes of two parents to produce a new gene combination.
  • AutoML Auto Machine Learning
  • the embodiments of this application can automatically generate a network with pruning weights for each layer, and then evaluate the performance of the pruned network on a preset verification set, thereby selecting the optimal network structure as the final target detection model.
  • the training pruning network consists of l pruning blocks, and each pruning block consists of two layers of fully connected layers.
  • the training pruning network takes the network encoding vector as input and generates a weight matrix.
  • the training pruning network uses the values in the network encoding vector as the output channel, and cuts the generated weight matrix to match the input and output of the training pruning network.
  • the weights of the training pruning network which are the parameters of the fully connected layer, are updated by calculating the gradient of the training pruning network.
  • the training system can obtain different training pruning network structures by randomly generating different network coding vectors. Once you have the network structure and network weights, you can test the performance of the network on the validation set. Finally, the evolutionary algorithm can be used to search for the optimal encoding vector to obtain the optimal training pruning network.
  • the specific operation is to regard network coding as a vector representation of the number of channels in each layer of the network. At this time, the number of channels in each layer can correspond to the genes in the evolutionary algorithm.
  • a large number of genes are randomly selected, and by calculating the accuracy of the weights generated by the pruning network on the verification set, the top K genes with the highest accuracy are taken out, and then new genes are generated using crossover and mutation methods. Mutation means randomly changing the proportion of elements in a gene.
  • Crossover means randomly recombining the genes of two parents to produce a new gene combination. By iterating this process repeatedly, the optimal training pruning network code can be obtained.
  • the initial target detection model also includes a system loss function for target detection.
  • the system loss function includes a bounding box coordinate error function, a bounding box confidence error function, and a classification error function.
  • the first item is the bounding box coordinate error function; the second item is the loss function of the height and width of the bounding box; the third item is the bounding box confidence error function when the object exists; the fourth item is the bounding box confidence when the object does not exist. Loss function; the fifth item is the classification error function of the unit grid where objects exist.
  • S is the unit grid division coefficient of the image; B is the number of bounding boxes predicted by each grid; C is the total number of categories; p is the category probability; It means that there is an object in the i-th unit grid, and the j-th bounding box in the cell predicts the target; ⁇ coord and ⁇ noobj are the weight coefficients of different loss functions.
  • Figure 7 is a flow chart of the application method of the target detection model provided by the embodiment of the present application.
  • the application method of the target detection model provided by the embodiment of the present application includes but is not limited to step S700, step S710 and step S720.
  • Step S700 obtain the actual digital image and input the actual digital image into the target detection model
  • Step S710 perform feature extraction on the actual digital image through the depth-separable convolution layer of the target detection model, and output a feature map
  • Step S720 Perform target detection on the feature map through the multi-scale feature fusion mechanism of the target detection model.
  • an actual digital image is obtained and input into the target detection model; feature extraction is performed on the actual digital image through the depth-separable convolution layer of the target detection model, and a feature map is output; through the target detection model
  • the multi-scale feature fusion mechanism performs target detection on feature maps.
  • this embodiment of the present application also provides a device for establishing a target detection model, including:
  • the network modification module 800 is used to obtain the basic target detection network, replace the ordinary convolutional layer of the basic target detection network with a depth-separable convolutional layer, and add a multi-scale feature fusion mechanism to the basic target detection network to obtain initial target detection.
  • Model
  • the digital image acquisition module 810 is used to acquire a preset digital image and input the preset digital image into the initial target detection model;
  • the feature extraction module 820 is used to extract features from the preset digital image through the depth-separable convolution layer of the initial target detection model and output a feature map;
  • the target detection module 830 is used to perform target detection on the feature map through the multi-scale feature fusion mechanism of the initial target detection model to obtain an intermediate target detection model;
  • the model optimization module 840 is used to optimize the intermediate target detection model using the NetAdapt algorithm and the pruning algorithm to obtain the final target detection model.
  • this embodiment of the present application also provides a target detection device, including:
  • the digital image acquisition module 900 is used to acquire actual digital images and input the actual digital images into the target detection model;
  • the feature extraction module 910 is used to extract features from the actual digital image through the depth-separable convolution layer of the target detection model and output a feature map;
  • the target detection module 920 is used to perform target detection on the feature map through the multi-scale feature fusion mechanism of the target detection model.
  • embodiments of the present application also provide a target detection device, which includes: a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor and memory may be connected via a bus or other means.
  • memory can be used to store non-transitory software programs and non-transitory computer executable programs.
  • the memory may include high-speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device.
  • the memory may optionally include memory located remotely from the processor, and the remote memory may be connected to the processor via a network. Examples of the above-mentioned networks include but are not limited to the Internet, intranets, local area networks, mobile communication networks and combinations thereof.
  • the target detection device in this embodiment can be applied as the method for establishing the target detection model in the above embodiment and/or the application method of the target detection model in the above embodiment.
  • the target detection device in this embodiment can It has the same inventive concept as the establishment method of the target detection model and/or the application method of the target detection model in the above embodiments. Therefore, these embodiments have the same implementation principles and technical effects, which will not be described in detail here. .
  • the non-transient software programs and instructions required to implement the target detection model establishment method in the above embodiment and/or the application method of the target detection model in the above embodiment are stored in the memory. When executed by the processor, the execution is as described above.
  • the establishment method of the target detection model of the embodiment and/or the application method of the target detection model of the above embodiment for example, perform the above-described method steps S200 to S250 in Figure 2, method steps S310 to S320 in Figure 3, Figure Method steps S400 to S430 in Figure 4, method steps S500 to S510 in Figure 5, method steps S600 to S610 in Figure 6, and method steps S700 to S720 in Figure 7.
  • the object detection device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • embodiments of the present application also provide a computer-readable storage medium that stores computer-executable instructions.
  • the computer-executable instructions are executed by a processor or a controller, for example, detected by the above target.
  • Execution by a processor in the device embodiment can cause the above-mentioned processor to perform the method for establishing a target detection model and/or the method for applying the target detection model in the above-described embodiment, for example, executing the above-described process in Figure 2 Method steps S200 to S250, method steps S310 to S320 in Figure 3, method steps S400 to S430 in Figure 4, method steps S500 to S510 in Figure 5, method steps S600 to S610 in Figure 6, Figure 7 Method steps S700 to S720 in .
  • the above computer-readable storage medium may be non-volatile or volatile.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.

Abstract

本申请公开了一种目标检测模型的建立方法、应用方法、设备、装置及介质,可用于图像识别领域;所述建立方法包括:获取基础目标检测网络,将基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至基础目标检测网络,得到初始目标检测模型;获取预设数字图像,并将预设数字图像输入至初始目标检测模型;通过初始目标检测模型的深度可分离卷积层对预设数字图像进行特征提取,输出特征图;通过初始目标检测模型的多尺度特征融合机制对特征图进行目标检测,得到中间目标检测模型;采用NetAdapt算法和剪枝算法对中间目标检测模型进行优化处理,得到最终目标检测模型。本申请能够有效提升嵌入式设备目标检测效率。

Description

目标检测模型的建立方法、应用方法、设备、装置及介质
本申请要求于2022年3月15日提交中国专利局、申请号为2022102546857,发明名称为“目标检测模型的建立方法、应用方法、设备、装置及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能中的计算机视觉及图像识别领域,尤其涉及一种目标检测模型的建立方法、应用方法、设备、装置及介质。
背景技术
目标检测广泛应用于人工智能、机器人导航、智能视频监控、工业检测、航空航天等诸多领域;同时目标检测也是许多视觉任务需要的一个前序算法,对后续的人脸识别、步态识别、人群计数、实例分割等任务起着至关重要的作用。由于深度学习的广泛运用,目标检测算法得到了较为快速的发展。基于深度学习的目标检测算法主要分为(1)两阶段目标检测:第一阶段先产生候选区域,包含目标大概的位置信息,然后第二阶段对候选区域进行分类和位置精修;(2)单阶段目标检测:直接生成物体的类别概率和对应位置坐标值。
技术问题
以下是发明人意识到的现有技术的技术问题:现有的目标检测算法的网络参数数量都比较大,实际应用中依赖服务器等大型计算机上才能达到实时检测目标的效果,但是当将网络结构移植到手机等嵌入式设备时却难以达到该效果,因为嵌入式设备的处理器性能远不如服务器。
技术解决方案
第一方面,本申请实施例提供了一种目标检测模型的建立方法,包括:获取基础目标检测网络,将所述基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至所述基础目标检测网络,得到初始目标检测模型;获取预设数字图像,并将所述预设数字图像输入至所述初始目标检测模型;通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图;通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,得到中间目标检测模型;采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,得到最终目标检测模型。
第二方面,本申请实施例提供了一种目标检测模型的应用方法,包括:获取实际数字图像,并将所述实际数字图像输入至最终目标检测模型;通过所述目标检测模型的深度可分离 卷积层对所述实际数字图像进行特征提取,输出特征图;通过所述目标检测模型的多尺度特征融合机制对所述特征图进行目标检测。
第三方面,本申请实施例提供了一种目标检测模型建立装置,包括:网络修改模块,用于获取基础目标检测网络,将所述基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至所述基础目标检测网络,得到初始目标检测模型;数字图像获取模块,用于获取预设数字图像,并将所述预设数字图像输入至所述初始目标检测模型;特征提取模块,用于通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图;目标检测模块,用于通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,得到中间目标检测模型;模型优化模块,用于采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,得到最终目标检测模型。
第四方面,本申请实施例提供了一种目标检测装置,包括:数字图像获取模块,用于获取实际数字图像,并将所述实际数字图像输入至目标检测模型;特征提取模块,用于通过所述目标检测模型的深度可分离卷积层对所述实际数字图像进行特征提取,输出特征图;目标检测模块,用于通过所述目标检测模型的多尺度特征融合机制对所述特征图进行目标检测。
第五方面,本申请实施例提供了一种目标检测设备,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现一种目标检测模型的建立方法和/或一种目标检测模型的应用方法:其中,所述目标检测模型的建立方法包括:获取基础目标检测网络,将所述基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至所述基础目标检测网络,得到初始目标检测模型;获取预设数字图像,并将所述预设数字图像输入至所述初始目标检测模型;通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图;通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,得到中间目标检测模型;采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,得到最终目标检测模型;其中,所述目标检测模型的应用方法包括:获取实际数字图像,并将所述实际数字图像输入至目标检测模型;通过所述目标检测模型的深度可分离卷积层对所述实际数字图像进行特征提取,输出特征图;通过所述目标检测模型的多尺度特征融合机制对所述特征图进行目标检测。
第六方面,本申请实施例提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行一种目标检测模型的建立方法和/或一种目标检测模型的应用方法:其中,所述目标检测模型的建立方法包括:获取基础目标检测网络,将所述基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至所述基础目标检测网络,得到初始目标检测模型;获取预设数字图像,并将所述预设数字图像输入至所述初始目标检测模型;通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图;通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,得到中间目标检测模型;采用NetAdapt算法和剪枝算法对所 述中间目标检测模型进行优化处理,得到最终目标检测模型;其中,所述目标检测模型的应用方法包括:获取实际数字图像,并将所述实际数字图像输入至目标检测模型;通过所述目标检测模型的深度可分离卷积层对所述实际数字图像进行特征提取,输出特征图;通过所述目标检测模型的多尺度特征融合机制对所述特征图进行目标检测。
有益效果
本申请实施例提出的目标检测模型的建立方法、应用方法、设备、装置及介质,通过获取基础目标检测网络,将基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至基础目标检测网络,得到初始目标检测模型;获取预设数字图像,并将预设数字图像输入至初始目标检测模型;通过初始目标检测模型的深度可分离卷积层对预设数字图像进行特征提取,输出特征图;通过初始目标检测模型的多尺度特征融合机制对特征图进行目标检测,得到中间目标检测模型;采用NetAdapt算法和剪枝算法对中间目标检测模型进行优化处理,得到最终目标检测模型。本申请将基础目标检测网络中的普通卷积层更换为深度可分离卷积层,同时加入多尺度特征融合机制,得到初始目标检测模型,初始目标检测模型利用深度可分离卷积提取特征,相较利用普通卷积提取特征,深度可分离卷积的参数个数更少,从而降低嵌入式设备中处理器的数据处理负担,同理,在参数个数相同的情况下,采用深度可分离卷积的神经网络层数可以做的更深。加入多尺度特征融合机制能够使得目标检测模型在进行目标检测时候同时学习到深层特征与浅层特征,对特征的表达效果会更好,加强目标的检测精度。获取预设数字图像并将预设数字图像输入至初始目标检测模型对初始目标检测模型进行模型训练,得到中间目标检测模型。采用NetAdapt算法和剪枝算法对中间目标检测模型进行优化处理,得到最终目标检测模型,NetAdapt算法和剪枝算法使得中间目标检测模型小型化得到最终目标检测模型,达到加速推理的目的,最终目标检测模型在不同嵌入式设备上都可以运行,同样整体检测速度会更快,提升在嵌入式设备上进行目标检测的效果。
附图说明
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请实施例提供的用于建立目标检测模型和应用目标检测模型的系统架构平台的示意图;
图2是本申请实施例提供的目标检测模型的建立方法的流程图;
图3是本申请实施例提供的特征提取方法的流程图;
图4是本申请实施例提供的利用多尺度特征融合机制进行目标检测的方法流程图;
图5是本申请实施例提供的NetAdapt算法优化处理的流程图;
图6是本申请实施例提供的剪枝算法优化处理的流程图;
图7是本申请实施例提供的目标检测模型的应用方法流程图;
图8是本申请实施例提供的目标检测模型建立装置的结构示意图;
图9是本申请实施例提供的目标检测装置的结构示意图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
在本申请的描述中,若干的含义是一个或者多个,多个的含义是两个以上,大于、小于、超过等理解为不包括本数,以上、以下、以内等理解为包括本数。如果有描述到第一、第二只是用于区分技术特征为目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者隐含指明所指示的技术特征的先后关系。
本申请的描述中,除非另有明确的限定,设置、安装、连接等词语应做广义理解,所属技术领域技术人员可以结合技术方案的具体内容合理确定上述词语在本申请中的具体含义。
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
首先,对本申请中涉及的若干名词进行解析:
人工智能(Artificial Intelligence,AI):是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学;人工智能是计算机科学的一个分支,人工智能企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。人工智能可以对人的意识、思维的信息过程的模拟。人工智能还是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
自然语言处理(Natural Language Processing,NLP):NLP用计算机来处理、理解以及运用人类语言(如中文、英文等),NLP属于人工智能的一个分支,是计算机科学与语言学的交叉学科,又常被称为计算语言学。自然语言处理包括语法分析、语义分析、篇章理解等。自然语言处理常用于机器翻译、手写体和印刷体字符识别、语音识别及文语转换、信息检索、信息抽取与过滤、文本分类与聚类、舆情分析和观点挖掘等技术领域,它涉及与语言处理相关的数据挖掘、机器学习、知识获取、知识工程、人工智能研究和与语言计算相关的语言学研究等。
信息抽取(Information Extraction,NER):从自然语言文本中抽取指定类型的实体、关系、事件等事实信息,并形成结构化数据输出的文本处理技术。信息抽取是从文本数据中抽取特定信息的一种技术。文本数据是由一些具体的单位构成的,例如句子、段落、篇章,文本信息正是由一些小的具体的单位构成的,例如字、词、词组、句子、段落或是这些具体的单位 的组合。抽取文本数据中的名词短语、人名、地名等都是文本信息抽取,当然,文本信息抽取技术所抽取的信息可以是各种类型的信息。
相关技术中,目标检测是计算机视觉和数字图像处理领域的一个热门方向,广泛应用于人工智能、机器人导航、智能视频监控、工业检测、航空航天等诸多领域;同时目标检测也是许多视觉任务需要的一个前序算法,对后续的人脸识别、步态识别、人群计数、实例分割等任务起着至关重要的作用。由于深度学习的广泛运用,目标检测算法得到了较为快速的发展。基于深度学习的目标检测算法主要分为(1)两阶段目标检测:第一阶段先产生候选区域,包含目标大概的位置信息,然后第二阶段对候选区域进行分类和位置精修;(2)单阶段目标检测:直接生成物体的类别概率和对应位置坐标值。相比于两阶段的算法,现有的单阶段算法不需要生成候选区,整体流程更简单,速度更快,但是准确率不够高;而两阶段算法保证准确率的情况下速度又不够快;且现有的目标检测算法的网络参数数量都比较大,实际应用中依赖服务器等大型计算机上才能达到实时检测目标的效果,但是当将网络结构移植到手机等嵌入式设备时却难以达到该效果,因为嵌入式设备的处理器性能远不如服务器。
基于此,本申请实施例提供了一种目标检测模型的建立方法、应用方法、设备、装置及介质,能够有效提升嵌入式设备目标检测效率。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
本申请实施例提供的目标检测模型的建立方法、应用方法,涉及人工智能及数字医疗技术领域。本申请实施例提供的目标检测模型的建立方法、应用方法可应用于终端中,也可应用于服务器端中,还可以是运行于终端或服务器端中的软件。在一些实施例中,终端可以是智能手机、平板电脑、笔记本电脑、台式计算机等;服务器端可以配置成独立的物理服务器,也可以配置成多个物理服务器构成的服务器集群或者分布式系统,还可以配置成提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN以及大数据和人工智能平台等基础云计算服务的云服务器;软件可以是实现目标检测模型的应用等,但并不局限于以上形式。
本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式 计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
下面结合附图,对本申请实施例作进一步的阐述。
如图1所示,图1是本申请实施例提供的用于建立目标检测模型和应用目标检测模型的系统架构平台的示意图。
本申请实施例的系统架构平台100包括一个或多个处理器110和存储器120,图1中以一个处理器110及一个存储器120为例。
处理器110和存储器120可以通过总线或者其他方式连接,图1中以通过总线连接为例。
存储器120作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器120可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器120可选包括相对于处理器110远程设置的存储器120,这些远程存储器可以通过网络连接至该系统架构平台100。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
本领域技术人员可以理解,图1中示出的装置结构并不构成对系统架构平台100的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图2所示,图2是本申请实施例提供的目标检测模型的建立方法的流程图,本申请实施例的目标检测模型的建立方法,包括但不限于步骤S200、步骤S210、步骤S220、步骤S230、步骤S240和步骤S250。
步骤S200,获取基础目标检测网络;
步骤S210,将基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至基础目标检测网络,得到初始目标检测模型;
步骤S220,获取预设数字图像,并将预设数字图像输入至初始目标检测模型;
步骤S230,通过初始目标检测模型的深度可分离卷积层对预设数字图像进行特征提取,输出特征图;
步骤S240,通过初始目标检测模型的多尺度特征融合机制对特征图进行目标检测,得到中间目标检测模型;
步骤S250,采用NetAdapt算法和剪枝算法对中间目标检测模型进行优化处理,得到最终目标检测模型。
在本申请实施例中,获取基础目标检测网络,将基础目标检测网络的主干结构刚换为轻量级神经网络,轻量级神经网络是指需要参数数量较少和计算代价较小的神经网络模型。由于微型神经网络计算开销小,微型神经网络模型可以部署在计算资源有限的设备上,如智能手机、平板电脑或其他嵌入式设备。
将基础目标检测网络的普通卷积层替换为深度可分离卷积层,深度可分离卷积是每个通道进行独立的卷积,再通过逐点卷积将特征图在深度方向上进行加权组合,生成新的特征图。参数个数比普通卷积更少,有效减少嵌入式设备处理器的数据处理负担,若在参数量相同的 前提下,采用深度可分离卷积的神经网络层数可以做的更深,大大提升目标检测的特征提取效果。
将多尺度特征融合机制加入至基础目标检测网络,能够使得目标检测模型在进行目标检测时候同时学习到深层特征与浅层特征,对特征的表达效果会更好,加强目标的检测精度,具有更好的特征表达效果。
将基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至基础目标检测网络,得到初始目标检测模型,获取预设数字图像并将预设数字图像输入至初始目标检测模型中进行模型训练,在模型训练过程中,通过初始目标检测模型的深度可分离卷积层对预设数字图像进行特征提取,输出特征图;通过初始目标检测模型的多尺度特征融合机制对特征图进行目标检测,得到中间目标检测模型。由于中间目标检测模型过于臃肿,采用采用NetAdapt(Platform-Aware Neural Network Adaptation for Mobile Applications)算法和剪枝算法对中间目标检测模型进行优化处理,得到最终目标检测模型,NetAdapt算法和剪枝算法使得中间目标检测模型小型化得到最终目标检测模型,达到加速推理的目的,最终目标检测模型在不同嵌入式设备上都可以运行,同样整体检测速度会更快,提升在嵌入式设备上进行目标检测的效果。
需要说明的是,在本实施例中,NetAdapt算法对中间目标检测模型的深度可分离卷积层的卷积核进行优化处理;剪枝算法对中间目标检测模型的网络结构进行优化处理。
另外,需要说明的是,在本实施例中,上述模型训练为常规模型训练流程。
如图3所示,图3是本申请实施例提供的特征提取方法的流程图,本申请实施例的特征提取方法,包括但不限于步骤S300、步骤S310和步骤S320。
步骤S300,利用逐点卷积对预设数字图像进行通道升维处理;
步骤S310,利用深度卷积对通道升维后的预设数字图像进行特征提取处理,得到多个初始特征图;
步骤S320,利用逐点卷积对多个初始特征图进行通道降维处理,并输出最终特征图。
在本申请实施例中,通过深度可分离卷积层对预设数字图像进行特征提取,并输出最终特征图,深度可分离卷积(Depthwise Separable Convolution)主要由深度卷积(Depthwise Convolution)和逐点卷积(Pointwise Convolution)组成,相对于普通卷积,深度可分离卷积可以执行更少的卷积步骤但达到相同的特征提取效果,参数个数更少,在参数个数相同的情况下,可以做的比普通卷积更深,大大提升目标检测的特征提取效果。
深度可分离卷积层获取到预设数字图像后,利用逐点卷积对预设数字图像进行通道升维处理,因为深度卷积由于本身的计算特性决定它自己没有改变通道数的能力,上一层给它多少通道,它就只能输出多少通道。所以如果上一层给的通道数本身很少的话,深度卷积也只能低维空间提特征,因此效果不够好。为了改善这个问题,在进行深度卷积提取特征之前进行逐点卷积,用于进行图像升维处理,并且定义通道升维系数T,这样不管输入通道数是多是少,经过逐点卷积通道升维处理之后,深度卷积都是在相对的更高维空间高效地进行特征提取处理。
利用深度卷积对通道升维后的预设数字图像进行特征处理,由于深度卷积的卷积核和输入通道一一对应,一个卷积核负责一个输入通道,一个输入通道只被一个卷积核卷积,因此产生输出的初始特征图个数与卷积核数相同,在深度卷积进行特征提取处理前进行逐点卷积进行通道升维处理,利用深度卷积进行特征提取处理会得到多个初始特征图。
利用逐点卷积对多个初始特征图进行通道降维处理,使得深度卷积提取到的特征组合起来,输出最终特征图,进行降维处理可以很好地保持网络性能且使得网络更加轻量,同时较低维的特征包含了所有的必须信息。
如图4所示,图4是本申请实施例提供的利用多尺度特征融合机制进行目标检测的方法流程图,本申请实施例提供的目标检测方法,包括但不限于步骤S400、步骤S410、步骤S420和步骤S430。
步骤S400,获取第一深度可分离卷积层输出的第一最终特征图及第一最终特征图的高度及宽度;
步骤S410,获取第二深度可分离卷积层输出的第二最终特征图并调整第二最终特征图的高度及宽度使第二最终特征图的高度及宽度与第一最终特征图的高度及宽度相同;
步骤S420,将调整后的第二最终特征图与第一最终特征图进行通道拼接和卷积得到融合特征;
步骤S430,根据融合特征进行目标检测。
在本申请实施例,利用多尺度特征融合机制进行目标检测,高层特征网络的感受野比较大,语义信息表征能力强,但是特征网络的分辨率低,特征的表征能力弱(空间几何特征细节缺乏);低层特征网络的感受野比较小,特征信息表征能力强,虽然分辨率高,但是语义信息表征能力弱。高层特征网络的语义信息能够准确的检测或分割出目标。因此在进行目标检测中把这些特征全部加在一起对于检测和分割以提升目标检测效果。小尺度特征网络具有大的感受野,适合检测大目标的物体,大尺度特征网络具有较小的感受野,所以适合检测小目标。采用若干种不同尺度的特征网络对不同大小的目标进行检测,加强目标检测的检测精度。
通过获取第一深度可分离卷积层输出的第一最终特征图及第一最终特征图的高度及宽度,获取第二深度可分离卷积层输出的第二最终特征图并调整第二最终特征图的高度及宽度使第二最终特征图的高度及宽度与第一最终特征图的高度及宽度相同,将调整后的第二最终特征图与第一最终特征图进行通道拼接和卷积得到融合特征,根据融合特征进行目标检测,调整网络后面层输出的特征图与网络前面层输出的特征图进行特征拼接,这样同时检测到深层特征与浅层特征,提高特征的表达效果同时提高目标检测模型对不同大小目标的检测能力。
需要说明的是,在初始目标检测模型进行模型训练时,每个特征网格检测4个边界框,如果物体的中心点落在该特征网格内,则只选择与真实边框IOU(Intersection Over Union,交并比)重叠度最大的边界框进行检测,舍弃其他IOU值较小的边界框,这样可以提高模型对不同大小目标的检测能力,使特征网格的边界框提高泛化能力。
如图5所示,图5是本申请实施例提供的NetAdapt算法优化处理的流程图,本申请实施例提供的NetAdapt算法优化处理,包括但不限于步骤S500和步骤S510。
步骤S500,对一层原初深度可分离卷积网络的卷积核进行优化,得到多个第二深度可分离卷积网络;
步骤S510,将一个第二深度可分离卷积网络与第二深度可分离卷积网络对应的原初深度可分离卷积网络进行时延和精度比较,并根据比较结果选择最终深度可分离卷积网络。
现有技术中,在设计高效的网络结构的时候一般有2种常用的方法:1、不管所在的平台,统一设计单一的网络模型,但是这样会使得所设计的网络模型在不同的平台上表现就并不相同;2、在给定的平台硬件设备上手工设计对应的网络结构,这就需要了解详细的底层硬件知识,换了平台之后又得重新设计。
上述两种方法均未能满足本申请的要求,本申请实施例采用一种网络压缩的方法NetAdapt算法,它将优化之后的网络部署到设备上直接获取实际性能指标,之后再根据这个实际得到的性能指标指导新的网络压缩策略,从而以这样迭代的方式进行网络压缩,得到最后的结果。NetAdapt网络优化以自动的方式进行,以逐渐降低预训练网络的资源消耗,同时最大化精度。优化循环运行,直到满足资源预算。通过这种设计,NetAdapt不仅可以生成一个满足预算的网络,而且可以生成一系列具有不同折衷的简化网络,从而实现动态的网络选择和进一步的研究。
在本申请实施例中,采用NetAdapt算法搜索每个深度可分离卷积网络层的卷积核数量。并对每个深度可分离卷积网络层的卷积核个数进行优化,最终目的是在符合延时衰减的第二深度可分离卷积网络集合中找到精度大延时小的网络作为最终深度可分离卷积网络。在优化目标检测模型延同时保持精度,减小扩充层和每一层深度可分离卷积网络层中瓶颈的大小。NetAdapt算法对一层原初深度可分离卷积网络的卷积核进行优化,得到多个第二深度可分离卷积网络作为第二深度可分离卷积网络集合,从第二深度可分离卷积网络集合中选择一个第二深度可分离卷积网络,并将该第二深度可分离卷积网络与该第二深度可分离卷积网络对应的原初深度可分离卷积网络进行时延和精度比较,并根据比较结果选择最终深度可分离卷积网络,当该第二深度可分离卷积网络的时延大于原初深度可分离卷积网络的时延和/或第二深度可分离卷积网络的精度低于原初深度可分离卷积网络,选用原初深度可分离卷积网络为最终深度可分离卷积网络;当第二深度可分离卷积网络的时延小于原初深度可分离卷积网络的时延和第二深度可分离卷积网络的精度高于原初深度可分离卷积网络,选用第二深度可分离卷积网络为最终深度可分离卷积网络。
如图6所示,图6是本申请实施例提供的剪枝算法优化处理的流程图,本申请实施例提供的剪枝算法优化处理,包括但不限于步骤S600和步骤S610。
步骤S600,对中间目标检测模型的网络结构进行剪枝处理,去除网络结构的冗余权重参数;
步骤S610,对剪枝处理后的中间目标检测模型进行微调。
在本申请实施例中,中间目标检测模型由初始目标检测模型进行模型训练得来,拥有大量冗余权重参数和对于目标检测无用的神经元,导致模型整体过于臃肿,利用剪枝算法对中间目标检测模型的网络结构进行剪枝处理,去除网络结构中的冗余权重参数和无用神经元, 实现更加紧凑的目标检测模型。
对中间目标检测模型的网络结构进行剪枝处理,去除网络结构的冗余权重参数包括但不限于以下步骤:首先用剪枝处理后网络结构每层的通道数对网络结构进行编码,转换为一组编码向量,为了搜索到一个最佳的剪枝网络,不断尝试各种不同的编码向量,重新输入剪枝网络之后会生成剪枝后的网络权重;然后根据网络结构、网络权重和预设验证集得到剪枝处理后中间目标监测模型的性能。进而用进化算法搜索最优的编码向量作为最终编码向量,根据最终编码向量得到最终目标检测模型。在用进化算法搜索最终编码向量,使用自定义的目标函数,目标函数包括但不限于网络的准确率函数,时延函数和计算量函数。
需要说明的是,用进化算法搜索最优的编码向量作为最终编码向量,根据最终编码向量得到最终目标检测模型。具体操作包括但不限于将编码向量视为网络在每层上通道数量的向量表示,此时每层上的通道数量可以对应到进化算法中的基因。首先随机选择大量的基因,通过计算剪枝网络产生的网络权重在预设验证集上的准确率,取出前K个最高准确率的基因,然后使用交叉和变异方法产生新的基因。变异即为随机改变基因中的元素比例,交叉是随机重组两个双亲的基因的来产生新的基因组合,反复迭代这个过程,即可得到编码向量作为最终编码向量,根据最终编码向量得到最终目标检测模型。
需要说明的是,在本申请实施例中,利用AutoML(Auto Machine Learning自动机器学习)寻找最终编码向量,进而根据最终编码向量得到最终目标检测模型,AutoML具有自动寻找最优结构的特点,利用这一特点本申请实施例可以自动生成每层的剪枝权重的网络,然后评测剪枝网络在预设验证集上的性能,从而选出最优的网络结构作为最终目标检测模型。
另外,需要说明的是,在本申请实施例中,剪枝算法需要预先进行训练,训练剪枝网络由l个剪枝块组成,每个剪枝块由两层的全连接层组成。在前向过程中,训练剪枝网络以网络编码向量为输入,生成权重矩阵。同时,训练剪枝网络以网络编码向量中的数值为输出通道,并将生成的权重矩阵裁剪,来匹配训练剪枝网络的输入输出。对于一个输入图像,我们可以计算剪枝后网络的前向损失。在反向过程中,通过计算训练剪枝网络的梯度更新训练剪枝网络的权重也就是全连接层的参数。整个训练过程,训练系统通过随机生成不同的网络编码向量,就可以得到不同的训练剪枝网络结构。有了网络结构和网络权重之后,就可以在验证集上测试网络的性能。最后可用进化算法去搜索最优的编码向量,得到最优的训练剪枝网络。具体操作是将网络编码看做网络在每层上通道数量的向量表示,此时每层的通道数量可以对应到进化算法中的基因。首先随机选择大量的基因,通过计算剪枝网络产生的权重在验证集上的精度,取出前K个最高准确率的基因,然后使用交叉和变异方法产生新的基因。变异即为随机改变基因中的元素比例,交叉是随机重组两个双亲的基因的来产生新的基因组合,反复迭代这个过程,即可得到最优的训练剪枝网络编码。
在本申请实施例中,初始目标检测模型还包括用于目标检测的系统损失函数,如下列公式所示,系统损失函数包括边界框坐标误差函数、边界框置信度误差函数、分类误差函数。
Figure PCTCN2022090664-appb-000001
第一项是边界框坐标误差函数;第二项是边界框的高度与宽度的损失函数;第三项是存在物体的边界框置信度误差函数;第四项是不存在物体的边界框置信度损失函数;第五项是存在物体的单元网格的分类误差函数。S是图片的单元网格划分系数;B是每个网格预测的边界框个数;C是分类总数;p为类别概率;
Figure PCTCN2022090664-appb-000002
是指第i个单元网格内存在物体,且该单元格中的第j个边界框预测该目标;λ coord和λ noobj是不同损失函数的权重系数。
需要说明的是,上述系统损失函数同样被包括于中间目标检测模型和最终目标检测模型中。
如图7所示,图7是本申请实施例提供的目标检测模型的应用方法流程图,本申请实施例提供的目标检测模型的应用方法,包括但不限于步骤S700、步骤S710和步骤S720。
步骤S700,获取实际数字图像,并将实际数字图像输入至目标检测模型;
步骤S710,通过目标检测模型的深度可分离卷积层对实际数字图像进行特征提取,输出特征图;
步骤S720,通过目标检测模型的多尺度特征融合机制对特征图进行目标检测。
在本申请实施例中,获取实际数字图像,并将实际数字图像输入至目标检测模型;通过目标检测模型的深度可分离卷积层对实际数字图像进行特征提取,输出特征图;通过目标检测模型的多尺度特征融合机制对特征图进行目标检测。
如图8所示,本申请实施例还提供了一种目标检测模型建立装置,包括:
网络修改模块800,用于获取基础目标检测网络,将基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至基础目标检测网络,得到初始目标检测模型;
数字图像获取模块810,用于获取预设数字图像,并将预设数字图像输入至初始目标检测模型;
特征提取模块820,用于通过初始目标检测模型的深度可分离卷积层对预设数字图像进行特征提取,输出特征图;
目标检测模块830,用于通过初始目标检测模型的多尺度特征融合机制对特征图进行目标检测,得到中间目标检测模型;
模型优化模块840,用于采用NetAdapt算法和剪枝算法对中间目标检测模型进行优化处理,得到最终目标检测模型。
需说明的是,本申请方法实施例的内容均适用于本装置实施例,本装置实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法达到的有益效果也相同,在此不再赘述。
如图9所示,本申请实施例还提供了一种目标检测装置,包括:
数字图像获取模块900,用于获取实际数字图像,并将实际数字图像输入至目标检测模型;
特征提取模块910,用于通过目标检测模型的深度可分离卷积层对实际数字图像进行特征提取,输出特征图;
目标检测模块920,用于通过目标检测模型的多尺度特征融合机制对特征图进行目标检测。
需说明的是,本申请方法实施例的内容均适用于本装置实施例,本装置实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法达到的有益效果也相同,在此不再赘述。
另外,本申请实施例还提供了一种目标检测设备,该目标检测设备包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。
处理器和存储器可以通过总线或者其他方式连接。
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可选包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
需要说明的是,本实施例中的目标检测设备,可以应用为如上述实施例的目标检测模型的建立方法和/或上述实施例的目标检测模型的应用方法,本实施例中的目标检测设备与如上述实施例的目标检测模型的建立方法和/或上述实施例的目标检测模型的应用方法具有相同的发明构思,因此这些实施例具有相同的实现原理以及技术效果,此处不再详述。
实现如上述实施例的目标检测模型的建立方法和/或上述实施例的目标检测模型的应用方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行如上述实施例的目标检测模型的建立方法和/或上述实施例的目标检测模型的应用方法,例如,执行以上描述的图2中的方法步骤S200至S250、图3中的方法步骤S310至S320、图4中的方法步骤S400至步骤S430、图5中的方法步骤S500至S510、图6中的方法步骤S600至S610、图7中的方法步骤S700至步骤S720。
以上所描述的目标检测设备实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
此外,本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述目标检 测设备实施例中的一个处理器执行,可使得上述处理器执行如上述实施例的目标检测模型的建立方法和/或上述实施例的目标检测模型的应用方法,例如,执行以上描述的图2中的方法步骤S200至S250、图3中的方法步骤S310至S320、图4中的方法步骤S400至步骤S430、图5中的方法步骤S500至S510、图6中的方法步骤S600至S610、图7中的方法步骤S700至步骤S720。
此外,需要说明的是,上述计算机可读存储介质可以是非易失性,也可以是易失性。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本领域技术人员可以理解的是,图1-9中示出的技术方案并不构成对本申请实施例的限定,可以包括比图示更多或更少的步骤,或者组合某些步骤,或者不同的步骤。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上是对本申请的较佳实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (20)

  1. 一种目标检测模型的建立方法,其中,所述方法包括:
    获取基础目标检测网络,将所述基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至所述基础目标检测网络,得到初始目标检测模型;
    获取预设数字图像,并将所述预设数字图像输入至所述初始目标检测模型;
    通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图;
    通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,得到中间目标检测模型;
    采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,得到最终目标检测模型。
  2. 根据权利要求1所述的目标检测模型的建立方法,其中,所述NetAdapt算法对所述中间目标检测模型的所述深度可分离卷积层的卷积核进行优化处理;所述剪枝算法对所述中间目标检测模型的网络结构进行优化处理。
  3. 根据权利要求1所述的目标检测模型的建立方法,其中,所述通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图,包括:
    利用逐点卷积对所述预设数字图像进行通道升维处理;
    利用深度卷积对通道升维后的预设数字图像进行特征提取处理,得到多个初始特征图;
    利用逐点卷积对所述多个初始特征图进行通道降维处理,并输出最终特征图。
  4. 根据权利要求1所述的目标检测模型的建立方法,其中,所述通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,包括:
    获取第一深度可分离卷积层输出的第一最终特征图的高度及宽度;
    获取并调整第二深度可分离卷积层输出的第二最终特征图的高度及宽度使所述第二最终特征图的高度及宽度与所述第一最终特征图的高度及宽度相同;
    将调整后的所述第二最终特征图与所述第一最终特征图进行通道拼接和卷积得到融合特征;
    根据所述融合特征进行目标检测。
  5. 根据权利要求1所述的目标检测模型的建立方法,其中,所述采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,包括:
    所述NetAdapt算法对一层原初深度可分离卷积网络的卷积核进行优化,得到多个第二深度可分离卷积网络;
    所述NetAdapt算法将一个所述第二深度可分离卷积网络与所述第二深度可分离卷积网络对应的所述原初深度可分离卷积网络进行时延和精度比较,并根据比较结果选择最终深度可分离卷积网络。
  6. 根据权利要求5所述的目标检测模型的建立方法,其中,所述根据比较结果选择最终 深度可分离卷积网络,包括:
    当所述第二深度可分离卷积网络的时延大于所述原初深度可分离卷积网络的时延和/或所述第二深度可分离卷积网络的精度低于所述原初深度可分离卷积网络,选用所述原初深度可分离卷积网络为所述最终深度可分离卷积网络;
    当所述第二深度可分离卷积网络的时延小于所述原初深度可分离卷积网络的时延和所述第二深度可分离卷积网络的精度高于所述原初深度可分离卷积网络,选用所述第二深度可分离卷积网络为所述最终深度可分离卷积网络。
  7. 根据权利要求5所述的目标检测模型的建立方法,其中,所述方法还包括:
    所述剪枝算法对所述中间目标检测模型的网络结构进行剪枝处理,去除所述网络结构的冗余权重参数;
    所述剪枝算法对剪枝处理后的所述中间目标检测模型进行微调。
  8. 根据权利要求7所述的目标检测模型的建立方法,其中,所述去除所述网络结构的冗余权重参数,包括:
    基于剪枝处理后的所述中间目标检测模型中每层的通道数对所述网络结构进行编码,得到若干个编码向量;
    将所述编码向量输入所述中间目标检测模型后生成剪枝处理后的网络权重;
    根据所述网络结构、所述网络权重和预设验证集得到剪枝处理后所述中间目标检测模型的性能;
    利用进化算法筛选所述若干个编码向量得到最终编码向量,根据所述最终编码向量得到所述最终目标检测模型。
  9. 根据权利要求1所述的目标检测模型的建立方法,其中,所述初始目标检测模型还包括系统损失函数,所述系统损失函数包括边界框坐标误差函数、边界框置信度误差函数、分类误差函数。
  10. 一种目标检测模型的应用方法,其中,所述方法包括:
    获取实际数字图像,并将所述实际数字图像输入至目标检测模型;
    通过所述目标检测模型的深度可分离卷积层对所述实际数字图像进行特征提取,输出特征图;
    通过所述目标检测模型的多尺度特征融合机制对所述特征图进行目标检测。
  11. 一种目标检测模型建立装置,其中,所述装置包括:
    网络修改模块,用于获取基础目标检测网络,将所述基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至所述基础目标检测网络,得到初始目标检测模型;
    数字图像获取模块,用于获取预设数字图像,并将所述预设数字图像输入至所述初始目标检测模型;
    特征提取模块,用于通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图;
    目标检测模块,用于通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,得到中间目标检测模型;
    模型优化模块,用于采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,得到最终目标检测模型。
  12. 一种目标检测装置,其中,所述装置包括:
    数字图像获取模块,用于获取实际数字图像,并将所述实际数字图像输入至目标检测模型;
    特征提取模块,用于通过所述目标检测模型的深度可分离卷积层对所述实际数字图像进行特征提取,输出特征图;
    目标检测模块,用于通过所述目标检测模型的多尺度特征融合机制对所述特征图进行目标检测。
  13. 一种目标检测设备,其中,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现一种目标检测模型的建立方法和/或一种目标检测模型的应用方法:
    其中,所述目标检测模型的建立方法包括:
    获取基础目标检测网络,将所述基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至所述基础目标检测网络,得到初始目标检测模型;
    获取预设数字图像,并将所述预设数字图像输入至所述初始目标检测模型;
    通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图;
    通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,得到中间目标检测模型;
    采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,得到最终目标检测模型;
    其中,所述目标检测模型的应用方法包括:
    获取实际数字图像,并将所述实际数字图像输入至目标检测模型;
    通过所述目标检测模型的深度可分离卷积层对所述实际数字图像进行特征提取,输出特征图;
    通过所述目标检测模型的多尺度特征融合机制对所述特征图进行目标检测。
  14. 根据权利要求13所述的目标检测设备,其中,所述NetAdapt算法对所述中间目标检测模型的所述深度可分离卷积层的卷积核进行优化处理;所述剪枝算法对所述中间目标检测模型的网络结构进行优化处理。
  15. 根据权利要求13所述的目标检测设备,其中,所述通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图,包括:
    利用逐点卷积对所述预设数字图像进行通道升维处理;
    利用深度卷积对通道升维后的预设数字图像进行特征提取处理,得到多个初始特征图;
    利用逐点卷积对所述多个初始特征图进行通道降维处理,并输出最终特征图。
  16. 根据权利要求13所述的目标检测设备,其中,所述通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,包括:
    获取第一深度可分离卷积层输出的第一最终特征图的高度及宽度;
    获取并调整第二深度可分离卷积层输出的第二最终特征图的高度及宽度使所述第二最终特征图的高度及宽度与所述第一最终特征图的高度及宽度相同;
    将调整后的所述第二最终特征图与所述第一最终特征图进行通道拼接和卷积得到融合特征;
    根据所述融合特征进行目标检测。
  17. 根据权利要求13所述的目标检测设备,其中,所述采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,包括:
    所述NetAdapt算法对一层原初深度可分离卷积网络的卷积核进行优化,得到多个第二深度可分离卷积网络;
    所述NetAdapt算法将一个所述第二深度可分离卷积网络与所述第二深度可分离卷积网络对应的所述原初深度可分离卷积网络进行时延和精度比较,并根据比较结果选择最终深度可分离卷积网络。
  18. 根据权利要求17所述的目标检测设备,其中,所述根据比较结果选择最终深度可分离卷积网络,包括:
    当所述第二深度可分离卷积网络的时延大于所述原初深度可分离卷积网络的时延和/或所述第二深度可分离卷积网络的精度低于所述原初深度可分离卷积网络,选用所述原初深度可分离卷积网络为所述最终深度可分离卷积网络;
    当所述第二深度可分离卷积网络的时延小于所述原初深度可分离卷积网络的时延和所述第二深度可分离卷积网络的精度高于所述原初深度可分离卷积网络,选用所述第二深度可分离卷积网络为所述最终深度可分离卷积网络。
  19. 根据权利要求17所述的目标检测设备,其中,所述目标检测模型的建立方法还包括:
    所述剪枝算法对所述中间目标检测模型的网络结构进行剪枝处理,去除所述网络结构的冗余权重参数;
    所述剪枝算法对剪枝处理后的所述中间目标检测模型进行微调。
  20. 一种计算机可读存储介质,其中,存储有计算机可执行指令,所述计算机可执行指令用于执行一种目标检测模型的建立方法和/或一种目标检测模型的应用方法:
    其中,所述目标检测模型的建立方法包括:
    获取基础目标检测网络,将所述基础目标检测网络的普通卷积层替换为深度可分离卷积层,并将多尺度特征融合机制加入至所述基础目标检测网络,得到初始目标检测模型;
    获取预设数字图像,并将所述预设数字图像输入至所述初始目标检测模型;
    通过所述初始目标检测模型的所述深度可分离卷积层对所述预设数字图像进行特征提取,输出特征图;
    通过所述初始目标检测模型的所述多尺度特征融合机制对所述特征图进行目标检测,得到中间目标检测模型;
    采用NetAdapt算法和剪枝算法对所述中间目标检测模型进行优化处理,得到最终目标检测模型;
    其中,所述目标检测模型的应用方法包括:
    获取实际数字图像,并将所述实际数字图像输入至目标检测模型;
    通过所述目标检测模型的深度可分离卷积层对所述实际数字图像进行特征提取,输出特征图;
    通过所述目标检测模型的多尺度特征融合机制对所述特征图进行目标检测。
PCT/CN2022/090664 2022-03-15 2022-04-29 目标检测模型的建立方法、应用方法、设备、装置及介质 WO2023173552A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210254685.7A CN114627282A (zh) 2022-03-15 2022-03-15 目标检测模型的建立方法、应用方法、设备、装置及介质
CN202210254685.7 2022-03-15

Publications (1)

Publication Number Publication Date
WO2023173552A1 true WO2023173552A1 (zh) 2023-09-21

Family

ID=81901213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/090664 WO2023173552A1 (zh) 2022-03-15 2022-04-29 目标检测模型的建立方法、应用方法、设备、装置及介质

Country Status (2)

Country Link
CN (1) CN114627282A (zh)
WO (1) WO2023173552A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170418A (zh) * 2023-11-02 2023-12-05 杭州华橙软件技术有限公司 云台控制方法、装置、设备以及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532859A (zh) * 2019-07-18 2019-12-03 西安电子科技大学 基于深度进化剪枝卷积网的遥感图像目标检测方法
US20200134772A1 (en) * 2018-10-31 2020-04-30 Kabushiki Kaisha Toshiba Computer vision system and method
CN112347936A (zh) * 2020-11-07 2021-02-09 南京天通新创科技有限公司 一种基于深度可分离卷积的快速目标检测方法
CN112699958A (zh) * 2021-01-11 2021-04-23 重庆邮电大学 一种基于剪枝和知识蒸馏的目标检测模型压缩与加速方法
CN113313162A (zh) * 2021-05-25 2021-08-27 国网河南省电力公司电力科学研究院 一种多尺度特征融合目标检测的方法及系统
CN113780211A (zh) * 2021-09-16 2021-12-10 河北工程大学 一种基于改进型Yolov4-tiny的轻量级飞机检测方法
CN114120019A (zh) * 2021-11-08 2022-03-01 贵州大学 一种轻量化的目标检测方法
CN114170526A (zh) * 2021-11-22 2022-03-11 中国电子科技集团公司第十五研究所 基于轻量化网络的遥感影像多尺度目标检测识别方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134772A1 (en) * 2018-10-31 2020-04-30 Kabushiki Kaisha Toshiba Computer vision system and method
CN110532859A (zh) * 2019-07-18 2019-12-03 西安电子科技大学 基于深度进化剪枝卷积网的遥感图像目标检测方法
CN112347936A (zh) * 2020-11-07 2021-02-09 南京天通新创科技有限公司 一种基于深度可分离卷积的快速目标检测方法
CN112699958A (zh) * 2021-01-11 2021-04-23 重庆邮电大学 一种基于剪枝和知识蒸馏的目标检测模型压缩与加速方法
CN113313162A (zh) * 2021-05-25 2021-08-27 国网河南省电力公司电力科学研究院 一种多尺度特征融合目标检测的方法及系统
CN113780211A (zh) * 2021-09-16 2021-12-10 河北工程大学 一种基于改进型Yolov4-tiny的轻量级飞机检测方法
CN114120019A (zh) * 2021-11-08 2022-03-01 贵州大学 一种轻量化的目标检测方法
CN114170526A (zh) * 2021-11-22 2022-03-11 中国电子科技集团公司第十五研究所 基于轻量化网络的遥感影像多尺度目标检测识别方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170418A (zh) * 2023-11-02 2023-12-05 杭州华橙软件技术有限公司 云台控制方法、装置、设备以及存储介质
CN117170418B (zh) * 2023-11-02 2024-02-20 杭州华橙软件技术有限公司 云台控制方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN114627282A (zh) 2022-06-14

Similar Documents

Publication Publication Date Title
WO2020228376A1 (zh) 文本处理方法、模型训练方法和装置
US10691899B2 (en) Captioning a region of an image
WO2021164326A1 (zh) 一种视频处理方法、装置、设备及计算机可读存储介质
CN111368993B (zh) 一种数据处理方法及相关设备
US20210224601A1 (en) Video sequence selection method, computer device, and storage medium
CN113761153B (zh) 基于图片的问答处理方法、装置、可读介质及电子设备
CN114049381A (zh) 一种融合多层语义信息的孪生交叉目标跟踪方法
CN114663915B (zh) 基于Transformer模型的图像人-物交互定位方法及系统
US11803971B2 (en) Generating improved panoptic segmented digital images based on panoptic segmentation neural networks that utilize exemplar unknown object classes
CN111464881A (zh) 基于自优化机制的全卷积视频描述生成方法
WO2023179429A1 (zh) 一种视频数据的处理方法、装置、电子设备及存储介质
WO2021169453A1 (zh) 用于文本处理的方法和装置
KR20220047228A (ko) 이미지 분류 모델 생성 방법 및 장치, 전자 기기, 저장 매체, 컴퓨터 프로그램, 노변 장치 및 클라우드 제어 플랫폼
JP2024512628A (ja) キャプション生成器を生成するための方法および装置、並びにキャプションを出力するための方法および装置
CN113221882A (zh) 一种面向课程领域的图像文本聚合方法及系统
CN110111365B (zh) 基于深度学习的训练方法和装置以及目标跟踪方法和装置
CN115455171A (zh) 文本视频的互检索以及模型训练方法、装置、设备及介质
CN115221369A (zh) 视觉问答的实现方法和基于视觉问答检验模型的方法
WO2023173552A1 (zh) 目标检测模型的建立方法、应用方法、设备、装置及介质
CN111079374A (zh) 字体生成方法、装置和存储介质
CN110852066B (zh) 一种基于对抗训练机制的多语言实体关系抽取方法及系统
CN113095072A (zh) 文本处理方法及装置
CN116258147A (zh) 一种基于异构图卷积的多模态评论情感分析方法及系统
Yu et al. Construction of Garden Landscape Design System Based on Multimodal Intelligent Computing and Deep Neural Network
CN111768214A (zh) 产品属性的预测方法、系统、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931579

Country of ref document: EP

Kind code of ref document: A1