WO2021000404A1 - 基于深度学习的目标检测方法及电子装置 - Google Patents

基于深度学习的目标检测方法及电子装置 Download PDF

Info

Publication number
WO2021000404A1
WO2021000404A1 PCT/CN2019/102842 CN2019102842W WO2021000404A1 WO 2021000404 A1 WO2021000404 A1 WO 2021000404A1 CN 2019102842 W CN2019102842 W CN 2019102842W WO 2021000404 A1 WO2021000404 A1 WO 2021000404A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
image feature
pooling
layer
convolutional
Prior art date
Application number
PCT/CN2019/102842
Other languages
English (en)
French (fr)
Inventor
王健宗
贾雪丽
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021000404A1 publication Critical patent/WO2021000404A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • This application relates to the technical field of neural networks, and in particular to a target detection method, electronic device, computer equipment and readable storage medium based on deep learning.
  • Glioma cells are a kind of malignant tumor cells and the most common type of malignant tumor cells in the brain. Its incidence rate is higher than other brain tumors, and it is easy to recur. Therefore, the detection of glioma cells in advance through the target detection method is of great significance to the life and health of patients.
  • Target detection is to detect whether there is a target to be detected in the picture, and when there is a target, determine the position of the target.
  • Related technologies include Region Proposal Convolutional Neural Networks (RCNN), Fast RCNN and Faster RCNN The internet.
  • the RCNN and Fast RCNN networks use the Select Search algorithm as the generation mode of the target detection frame.
  • the algorithm generates a large number of target detection frames randomly and detects target features randomly, which is a dense detection method.
  • the Select Search algorithm is not accurate enough to detect the target frame of overlapping objects, and it consumes a lot of time.
  • the target frame generation mode (auchor method) adopted by Faster RCNN has superior performance.
  • the Anchor method generates a target detection frame for each point on the feature map, and uses a unified method to generate results for each point. Compared with the Select Search algorithm, the anchor method generates fewer target detection frames, and the result of identifying the object is more accurate. Therefore, the Faster RCNN network provides strong support for target detection tasks.
  • VGG16 Visual Geometry Group
  • Faster RCNN is a standard fully convolutional neural network model
  • image invariance that is, the semantic expression of the image does not vary with position. This shows excellent performance in classification tasks.
  • image features only obtain the approximate position of the abstract semantic expression of the image, while detailed features Ignored, resulting in the lack of details.
  • transfer convolution (deconvolution) operations in the convolutional neural network will lose the detailed features of the image, the feature map obtained by the VGG16 convolution network is not accurate enough.
  • this application aims to solve the problem that the feature maps obtained through the VGG16 convolutional network are not accurate enough.
  • this application provides a method for target detection based on deep learning, the method including:
  • Input the result of the fine adjustment into the Regional Proposal Network (RPN) network, and then go through the Fully Connected (FC) network to classify the target and background to obtain the target's category information and location information .
  • RPN Regional Proposal Network
  • FC Fully Connected
  • this application also provides an electronic device, including:
  • the obtaining module is used to obtain the picture to be detected
  • the extraction module is used to input the picture into the improved VGG16 network for image feature extraction
  • the pooling module is used to input the image features into the ROI Pooling network for pooling;
  • An adjustment module for inputting the pooling result into a 3*3*4 convolution kernel to fine-tune the pooling result
  • the classification module is used to input the fine adjustment result into the RPN network, and then pass through the fully connected layer network to classify the target and the background to obtain the category information and location information of the target.
  • the present application also provides a computer device, the computer device memory, a processor, and computer-readable instructions stored on the memory and running on the processor, the computer readable When the instruction is executed by the processor, the following steps are implemented:
  • Input the result of the fine adjustment into the RPN network, and then pass through the fully connected layer network to classify the target and the background to obtain the category information and location information of the target.
  • the present application also provides a non-volatile computer-readable storage medium in which computer-readable instructions are stored, and the computer-readable instructions can be at least One processor executes, so that the at least one processor executes the following steps:
  • Input the result of the fine adjustment into the RPN network, and then pass through the fully connected layer network to classify the target and the background to obtain the category information and location information of the target.
  • the target detection method, electronic device, computer equipment, and non-volatile computer readable storage medium based on deep learning obtained in this application obtain the first image feature by maximizing the pooling process of the first layer of convolutional image features.
  • the three-layer convolution image feature is set as the second image feature, and the fifth layer convolution image feature is converted and convolved to obtain the third image feature, and the first image feature, the second image feature and the third image feature are classified
  • One processing, the first normalized image, the second normalized image, and the third normalized image obtained by normalization are respectively passed through a 1*1*42 convolution kernel to adjust the number of channels, and the adjustment result is adjusted to the number of channels Stack operation, and then input the acquired image features into the ROI Pooling network, then connect the 3*3 convolution kernel to make minor adjustments, and finally connect an RPN network and a fully connected layer for classification.
  • FIG. 1 is a schematic diagram of the improved VGG16 network of this application.
  • FIG. 2 is a flowchart of the steps of the target detection method of the first application
  • FIG. 3 is a schematic diagram of the hardware architecture of the electronic device of the second application.
  • Figure 4 is a schematic diagram of the program modules of the target detection system of the third application.
  • FIG. 2 shows a flowchart of the steps of the target detection method of the first application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. It should be noted that, in this embodiment, the electronic device 2 is used as the execution subject for exemplary description. details as follows:
  • Step S100 Obtain a picture to be detected.
  • an imaging picture with glioma is obtained by means of CT, MRI, etc., and the imaging picture is input into the electronic device 2, and the electronic device 2 obtains the imaging picture, for example: the size of the imaging picture is 800 *600.
  • Step S102 input the picture into the improved VGG16 network for image feature extraction.
  • the electronic device 2 before inputting the picture into the improved VGG16 network for image feature extraction, the electronic device 2 needs to establish the improved VGG16 network.
  • FIG. 1 shows a schematic diagram of the improved VGG16 network of the present application.
  • the improved VGG16 network includes 5 convolutional layers, 6 pooling layers, and 1 converted convolutional layer. Among them, there is 1 pooling layer between the 5 convolutional layers, and the first volume Two pooling layers are set after the buildup layer, the conversion convolutional layer is set after the fifth convolutional layer, the pooling layer is the maximum pooling layer, and a nonlinear activation function is set after each convolutional layer .
  • the acquisition module 201 After the acquisition module 201 acquires the picture to be detected, it first passes the picture through 5 convolutional layers and 4 pooling layers to obtain the convolutional image features of each layer, and combine the The layer convolution image features are stored in the database. Then, the first layer of convolutional image features are subjected to maximum pooling processing to obtain the first image feature, and the third layer of convolutional image feature is set as the second image feature, wherein the second image feature is a standard image feature, The fifth-level convolution image feature is subjected to conversion convolution processing to obtain a third image feature, wherein the size of the third image feature is the same as the size (width and height) of the first image feature and the second image feature. ) Consistent.
  • the first image feature, the standard image feature, and the second image feature are respectively normalized to obtain a first normalized image, a second normalized image, and a third normalized image, respectively, So that the first normalized image, the second normalized image, and the third normalized image conform to a standard normal distribution, and the first normalized image, the second normalized image, and the The third normalized image is passed through a 1*1*42 convolution kernel to adjust the number of channels, and then the adjustment result is subjected to the channel number stacking operation.
  • setting the convolution layer so that the size of the image passing through the convolution layer does not change, and setting the parameters of the pooling layer to make the image size become half of the original.
  • the image feature of the third convolutional layer (CONV3) 200*150 as the reference image feature (ie, the second image feature) .
  • the sizes of the first image feature, the second image feature, and the third image feature are all 200*150 in size.
  • the first image feature, the second image feature, and the third image feature are respectively input to a batch normalization (BN) layer, so that the first image feature, the second image feature
  • BN batch normalization
  • the second image feature and the third image feature are normalized, and a first normalized image, a second normalized image, and a third normalized image are obtained respectively, so that the first image feature, the second Both the image feature and the third image feature conform to a standard normal distribution.
  • the first normalized image, the second normalized image and the third normalized image after the normalization process are respectively passed through a 1*1*42 convolution kernel to adjust the number of channels, and then, Stack the adjustment results in the third dimension (that is, the channel dimension) so that the number of channels becomes 3 times the original.
  • a 1*1*42 convolution kernel For example: after three image features with a size of 200*150 are subjected to a 1*1*42 convolution kernel, three images of 200*150*42 are obtained, and then the three images of 200*150*42 are The image features are stacked in the third dimension to obtain an image feature of 200*150*126.
  • Step S104 Input the image features into the ROI Pooling network for pooling.
  • the ROI Pooling network only has a pooling operation.
  • SAME method that is, to fill in 0 first, and change the input image features to the same length and width.
  • Image and then pooling operation
  • the second is to use the size of the kernel size different length and width.
  • a kernel size of 4*3 is used, and after ROI Pooling, the result is 13*13*126.
  • Step S106 input the pooling result into a 3*3*4 convolution kernel to fine-tune the pooling result.
  • the image feature in the pooling result after the ROI Pooling network is 13*13*126
  • the 13*13*126 image feature is passed through a 3*3*4 convolution kernel to pass 3*3*4
  • the convolution kernel makes a fine adjustment
  • the adjusted result is 13*13*4 image features.
  • the 3*3*4 convolution kernel increases the robustness of the entire system.
  • the number of 4 channels effectively realizes the operation of reducing the dimensionality, greatly reducing the amount of model parameters, and thus reducing the time complexity of the entire network degree.
  • Step S108 Input the result of the fine adjustment into the RPN network, and then pass the Fully Connected (FC) network to classify the target and the background to obtain the category information and location information of the target.
  • FC Fully Connected
  • the fully connected layer network (3*3*63 convolution kernel) will process the fine-tuned image features, and determine the pending image according to the heat displayed in the processing result.
  • the target in the detection image is glioma or the background, and the location information of the target, the area showing higher heat is glioma, and the area showing lower heat is the background.
  • FIG. 3 shows a schematic diagram of the hardware architecture of the electronic device of the second application.
  • the electronic device 2 includes, but is not limited to, a memory 21, a processing 22, and a network interface 23 that can communicate with each other through a system bus.
  • FIG. 2 only shows the electronic device 2 with components 21-23, but it should be understood that it is not It is required to implement all the illustrated components, and more or fewer components may be implemented instead.
  • the memory 21 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), random access memory (RAM), static memory Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 21 may be an internal storage unit of the electronic device 2, such as a hard disk or a memory of the electronic device 2.
  • the memory may also be an external storage device of the electronic device 2, for example, a plug-in hard disk equipped on the electronic device 2, a smart media card (SMC), a secure digital ( Secure Digital, SD card, Flash Card, etc.
  • the memory 21 may also include both the internal storage unit of the electronic device 2 and its external storage device.
  • the memory 21 is generally used to store the operating system and various application software installed in the electronic device 2, such as the program code of the target detection system 20.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 22 is generally used to control the overall operation of the electronic device 2.
  • the processor 22 is used to run the program code or process data stored in the memory 21, for example, to run the target detection system 20.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the electronic device 2 and other electronic devices.
  • the network interface 23 is used to connect the electronic device 2 with an external terminal via a network, and establish a data transmission channel and a communication connection between the electronic device 2 and the external terminal.
  • the network may be Intranet, Internet, Global System of Mobile Communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 4 shows a schematic diagram of program modules of the target detection system of the third application of the present application.
  • the target detection system 20 may include or be divided into one or more program modules.
  • the one or more program modules are stored in a storage medium and executed by one or more processors to complete this Apply and realize the above-mentioned target detection method.
  • the program module referred to in this application refers to a series of computer-readable instruction segments that can complete specific functions, and is more suitable for describing the execution process of the target detection system 20 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
  • the obtaining module 201 is used to obtain a picture to be detected.
  • an imaging picture with glioma is acquired by means of CT, MRI, etc., and the imaging picture is input into the electronic device 2, and the acquisition module 201 acquires the imaging picture, for example: the size of the imaging picture It is 800*600.
  • the extraction module 202 is used to input the picture into the improved VGG16 network for image feature extraction.
  • the establishment module 206 needs to establish the improved VGG16 network.
  • Fig. 1 shows a schematic diagram of the improved VGG16 network of the present application.
  • the improved VGG16 network includes 5 convolutional layers, 6 pooling layers, and 1 converted convolutional layer. Among them, there is 1 pooling layer between the 5 convolutional layers, and the first volume Two pooling layers are set after the buildup layer, the conversion convolutional layer is set after the fifth convolutional layer, the pooling layer is the maximum pooling layer, and a nonlinear activation function is set after each convolutional layer .
  • the extraction module 202 first passes the picture through 5 convolutional layers and 4 pooling layers to acquire the convolutional image features of each layer, respectively, And store the convolution image features of each layer in the database. Then, the extraction module 202 performs maximum pooling processing on the first-layer convolutional image feature to obtain the first image feature, and sets the third-layer convolutional image feature as the second image feature, wherein the second image feature Is a standard image feature, the fifth-level convolutional image feature is converted and convolved to obtain a third image feature, wherein the size of the third image feature is equal to the size of the first image feature and the second image feature The size (width and height) is the same.
  • the extraction module 202 performs normalization processing on the first image feature, the standard image feature, and the second image feature to obtain a first normalized image, a second normalized image, and a first normalized image, respectively.
  • the extraction module 202 passes the first normalized image, the second normalized image, and the third normalized image through a 1*1*42 convolution kernel to adjust the number of channels, and will Adjust the result to stack the number of channels.
  • setting the convolution layer so that the size of the image passing through the convolution layer does not change, and setting the parameters of the pooling layer to make the image size become half of the original.
  • the extraction module 202 selects the first, third, and fifth convolutional layers to perform multi-scale operations, and sets the image features of the third convolutional layer (CONV3) 200*150 as the reference image feature (that is, Second image feature).
  • the sizes of the first image feature, the second image feature, and the third image feature are all 200*150 in size.
  • the extraction module 202 inputs the first image feature, the second image feature, and the third image feature into a batch normalization (BN) layer, so that the first image The feature, the second image feature, and the third image feature are normalized, and a first normalized image, a second normalized image, and a third normalized image are obtained respectively, so that the first image feature , Both the second image feature and the third image feature conform to a standard normal distribution.
  • BN batch normalization
  • the extraction module 202 passes the normalized first normalized image, the second normalized image, and the third normalized image through a 1*1*42 convolution kernel to adjust The number of channels, and then stack the adjustment results in the third dimension (that is, the channel dimension), so that the number of channels becomes 3 times the original. For example: after three image features with a size of 200*150 are subjected to a 1*1*42 convolution kernel, three images of 200*150*42 are obtained, and then the three images of 200*150*42 are The image features are stacked in the third dimension to obtain an image feature of 200*150*126.
  • the pooling module 203 is configured to input the image features into the ROI Pooling network for pooling.
  • the ROI Pooling network only has a pooling operation.
  • ROI Pooling methods There are two existing ROI Pooling methods: the first is to use the SAME method, that is, to fill in 0 first, and change the input image features to the same length and width. Image, and then pooling operation; the second is to use the size of the kernel size different length and width.
  • the pooling module 203 uses a kernel size of 4*3 for the input image features of 200*150*126, and after ROI Pooling, the result is 13*13*126.
  • the adjustment module 204 is configured to input the pooling result into a 3*3*4 convolution kernel to fine-tune the pooling result.
  • the image feature in the pooling result after the ROI Pooling network is 13*13*126
  • the adjustment module 204 passes the 13*13*126 image feature through a 3*3*4 convolution kernel to pass
  • the 3*3*4 convolution kernel makes a fine adjustment
  • the adjusted result is 13*13*4 image features.
  • the 3*3*4 convolution kernel increases the robustness of the entire system.
  • the number of 4 channels effectively realizes the operation of reducing the dimensionality, greatly reducing the amount of model parameters, and thus reducing the time complexity of the entire network degree.
  • the classification module 205 is used to input the fine adjustment result into the RPN network, and then pass through the fully connected layer network to classify the target and the background to obtain the category information and location information of the target.
  • the classification module 205 inputs the fine adjustment result to the RPN network, and then, the fully connected layer network (3*3*63 convolution kernel) processes the fine adjustment image features, and displays the result according to the processing result.
  • the degree of heat is used to determine whether the target in the image to be detected is glioma or the background, and the location information of the target. The area showing higher heat is glioma, and the area showing lower heat is the background.
  • This application also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including independent servers, or more A server cluster composed of two servers), etc.
  • the computer device in this embodiment at least includes, but is not limited to: a memory, a processor, etc. that can be communicatively connected to each other through a system bus.
  • This embodiment also provides a non-volatile computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application mall, etc., on which storage There are computer-readable instructions, and the corresponding functions are realized when the program is executed by the processor.
  • the non-volatile computer-readable storage medium of this embodiment is used to store the target detection system 20, and when executed by a processor, the following steps are implemented:
  • Input the result of the fine adjustment into the RPN network, and then pass through the fully connected layer network to classify the target and the background to obtain the category information and location information of the target.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种基于深度学习的目标检测方法,包括:获取待检测的图片;将所述图片输入至改进型VGG16网络中进行图像特征提取;将所述图像特征输入至ROI Pooling网络中进行池化;将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。

Description

基于深度学习的目标检测方法及电子装置
本申请要求于2019年7月3日提交中国专利局,专利名称为“基于深度学习的目标检测方法及电子装置”,申请号为201910593114.4的发明专利的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及神经网络技术领域,尤其涉及一种基于深度学习的目标检测方法、电子装置、计算机设备及可读存储介质。
背景技术
脑胶质瘤细胞是一种恶性肿瘤细胞,也是脑部最为常见的一种恶性肿瘤细胞,它的发病率高于其他脑部肿瘤,并且易复发。故,通过目标检测的方法提前发现脑胶质瘤细胞对于患者的生命健康具有十分重要的意义。
目标检测是检测图片中是否存在待检测的目标,并在存在目标时,确定该目标的位置,相关技术中有区域生成卷积神经网络(Region Proposal Convolutional Neural Networks,RCNN)、Fast RCNN及Faster RCNN网络。其中,RCNN、Fast RCNN网络使用Select Search算法作为目标检测框的生成模式,该算法通过随机生成大量的目标检测框,随机检测目标特征,属于密集检测的方式。Select Search算法对于重叠物体的目标框检测不够精确,时间消耗较大。相比而言,Faster RCNN所采用的目标框生成模式(auchor方式)性能优越。Anchor方式通过对特征图上的每一个点生成目标检测框,对每一个点采用统一的方式生成结果。对比Select Search算法而言,anchor方式生成的目标检测框更少,识别物体的结果更加准确。所以,Faster RCNN网络对目标 检测任务提供了强有力的支持。
然,发明人意识到由于Faster RCNN中使用到的视觉几何群(Visual Geometry Group,VGG16)网络是一种标准的全卷积神经网络模型,具有显著的图像不变性,即图像的语义表达不随位置的变动而发生改变,这在分类任务中表现出了优良的性能,但对于分割任务与目标检测任务,图像的不变性使得图像特征只能获取到图像的抽象语义表达的大致位置,而细节特征被忽略,以此造成了细节特征的缺失。由于卷积神经网络中使用大量的池化操作与转移卷积(反卷积)操作都会丢失图像的细节特征,所以通过VGG16卷积网络获得的特征图不够精确。
故,本申请旨在解决通过VGG16卷积网络获得的特征图不够精确的问题。
发明内容
有鉴于此,有必要提供一种基于深度学习的目标检测方法、电子装置、计算机设备及非易失性计算机可读存储介质,能够增加系统的鲁棒性,有效的降低维度的操作,减少参数量,进而降低算法空间复杂度及时间复杂度,极大的提高检测准确度。
为实现上述目的,本申请提供了一种基于深度学习的目标检测方法,所述方法包括:
获取待检测的图片;
将所述图片输入至改进型VGG16网络中进行图像特征提取;
将所述图像特征输入至兴趣区域池化层(Region of Interest Pooling,ROI Pooling)网络中进行池化;
将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
将微调整结果输入至区域生成网络(Region Proposal Network,RPN)网 络,后经过全连接层(Fully Connected,FC)网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
为实现上述目的,本申请还提供了一种电子装置,包括:
获取模块,用于获取待检测的图片;
提取模块,用于将所述图片输入至改进型VGG16网络中进行图像特征提取;
池化模块,用于将所述图像特征输入至ROI Pooling网络中进行池化;
调整模块,用于将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
分类模块,用于将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
为实现上述目的,本申请还提供了一种计算机设备,所述计算机设备存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:
获取待检测的图片;
将所述图片输入至改进型VGG16网络中进行图像特征提取;
将所述图像特征输入至ROI Pooling网络中进行池化;
将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
为实现上述目的,本申请还提供了一种非易失性计算机可读存储介质,所述非易失性计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:
获取待检测的图片;
将所述图片输入至改进型VGG16网络中进行图像特征提取;
将所述图像特征输入至ROI Pooling网络中进行池化;
将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
本申请提供的基于深度学习的目标检测方法、电子装置、计算机设备及非易失性计算机可读存储介质,通过将第一层卷积图像特征作最大池化处理以获取第一图像特征,第三层卷积图像特征设定为第二图像特征,第五层卷积图像特征作转换卷积处理以获取第三图像特征,将第一图像特征、第二图像特征与第三图像特征进行归一化处理,将归一化获取的第一归一图像、第二归一图像及第三归一图像分别通过1*1*42的卷积核以调整通道数,并将调整结果进行通道数堆叠操作,然后将获取的图像特征输入至ROI Pooling网络中,然后接3*3卷积核做微小调整,最后再接一个RPN网络以及全连接层进行分类。通过本申请,增加了系统的鲁棒性,有效的降低了维度的操作,大大减少了参数量,进而降低了算法空间复杂度及时间复杂度,极大的提高了检测准确度。
附图说明
图1为本申请之改进型VGG16网络的示意图;
图2为本申请一之目标检测方法的步骤流程图;
图3为本申请二之电子装置的硬件架构示意图;
图4为本申请三之目标检测系统的程序模块示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
实施例一
参阅图2,示出了本申请一之目标检测方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。需要说明是,本实施例以电子装置2为执行主体进行示例性描述。具体如下:
步骤S100,获取待检测的图片。
具体实施例中,通过CT、核磁共振等方式获取具有脑胶质瘤的成像图片,并将该成像图片输入至电子装置2中,电子装置2获取该成像图片,例如:该成像图片大小为800*600。
步骤S102,将所述图片输入至改进型VGG16网络中进行图像特征提取。
在一较佳实施例中,在将所述图片输入至改进型VGG16网络中进行图像特征提取之前,所述电子装置2需建立所述改进型VGG16网络。请参阅图1, 示出了本申请之改进型VGG16网络的示意图。所述改进型VGG16网络包括5个卷积层、6个池化层及1个转换卷积层,其中,所述5个卷积层之间均设置有1个池化层,第1个卷积层后面设置2个池化层,第5个卷积层后面设置所述转换卷积层,所述池化层为最大池化层,在每个卷积层之后还设置有非线性激活函数。具体地,所述获取模块201在获取到待检测的图片之后,将所述图片首先经过5个卷积层以及4个池化层,以分别获取各层卷积图像特征,并将所述各层卷积图像特征存储于数据库中。然后,将第一层卷积图像特征进行最大池化处理以获取第一图像特征,将第三层卷积图像特征设定为第二图像特征,其中所述第二图像特征为标准图像特征,将第五层卷积图像特征进行转换卷积处理以获取第三图像特征,其中,所述第三图像特征的大小与所述第一图像特征及所述第二图像特征的大小(宽及高)一致。然后,将所述第一图像特征、所述标准图像特征及所述第二图像特征分别进行归一化处理,以分别获得第一归一图像、第二归一图像及第三归一图像,以使所述第一归一图像、所述第二归一图像及所述第三归一图像符合标准正态分布,将所述第一归一图像、所述第二归一图像及所述第三归一图像分别通过1*1*42的卷积核以调整通道数,然后将调整结果进行通道数堆叠操作。
示例性地,所述改进型VGG16网络包括5个卷积核大小为3,特征图填充圈数为1(也即kernel_size=3,pad=1)的卷积层,1个卷积核大小为2,卷积步长为2(也即kernel_size=2,stride=2)的池化层,1个卷积核大小为3,特征图填充圈数为1(也即kernel_size=3,pad=1)的转换卷积层,通过设置卷积层以使所述图像经过卷积层的大小不会发生改变,通过设置池化层的参数以使图像大小变为原来的二分之一。当输入的图片大小为800*600时,经过第一层卷积层(CONV1)后变化为800*600,经过第一个池化层(POOLING1)后变化为400*300,经过第二层卷积层(CONV2)后变化为400*300,经过第 二个池化层(POOLING2)后变化为200*150,经过第三层卷积层(CONV3)后变化为200*150,经过第三个池化层(POOLING3)后变化为100*75,经过第四层卷积层(CONV4)后变化为100*75,经过第四个池化层(POOLING4)后变化为50*38,经过第五层卷积层(CONV5)后变化为50*38。
然后,选取第一、三及五层卷积层进行多尺度操作,将所述第三层卷积层(CONV3)200*150的图像特征设定为基准图像特征(也即第二图像特征)。将所述第一层卷积层(CONV1)800*600的图像特征输入至2个卷积核大小为2,卷积步长为2(也即kernel_size=2,stride=2)的池化层,以对所述200*150的图像特征进行最大池化处理,并获取第一图像特征。将所述第五层卷积层(CONV5)50*38的图像特征输入至2个转换卷积层中,以对所述50*38的图像特征进行转换卷积处理,并获取第三图像特征。经过处理后,所述第一图像特征、所述第二图像特征及所述第三图像特征的大小均为200*150大小。
然后,将所述第一图像特征、所述第二图像特征及所述第三图像特征分别输入至批量归一化(Batch Normalization,BN)层,以使所述第一图像特征、所述第二图像特征及所述第三图像特征进行归一化处理,并分别获得第一归一图像、第二归一图像及第三归一图像,以使所述第一图像特征、所述第二图像特征及所述第三图像特征均符合标准正态分布。
最后,将归一化处理后的所述第一归一图像、所述第二归一图像及所述第三归一图像分别通过1*1*42的卷积核以调整通道数,然后,将调整结果在第三维度(也即通道维度)上进行堆叠,以使通道数变成原来的3倍。例如:三张大小均为200*150的图像特征经1*1*42的卷积核之后,得到三张200*150*42的图像特征,然后,将所述三张200*150*42的图像特征在第三维度堆叠,得到200*150*126的图像特征。
步骤S104,将所述图像特征输入至ROI Pooling网络中进行池化。
需要说明的是,ROI Pooling网络只有池化操作,现有的ROI Pooling的池化方式有两种:第一种是采用SAME方式,即先填充0,将输入的图像特征变为长宽相同的图像,然后再进行池化操作;第二种是采用kernel size长宽不同的尺寸。在本申请中,对输入的200*150*126的图像特征,采用4*3的kernel size,经过ROI Pooling池化,结果为13*13*126。
步骤S106,将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整。
具体地,经过ROI Pooling网络后的池化结果中图像特征为13*13*126,将该13*13*126图像特征经过一个3*3*4的卷积核,以通过3*3*4卷积核做一个微调,调整后的结果为13*13*4的图像特征。所述3*3*4的卷积核增加了整个系统的鲁棒性,同时,4通道数有效的实现了降低维度的操作,大大减少了模型的参数量,进而降低了整个网络的时间复杂度。
步骤S108,将微调整结果输入至RPN网络,后经过全连接层(Fully Connected,FC)网络以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
举例来说,将微调整结果输入至RPN网络,然后,全连接层网络(3*3*63的卷积核)将微调整后的图像特征进行处理,根据处理结果中显示的热度来确定待检测图像中的目标为脑胶质瘤还是背景,及该目标的位置信息,其中显示较高热度的区域为脑胶质瘤,显示较低热度的区域为背景。
通过本申请,增加了系统的鲁棒性,有效的降低维度的操作,大大减少了参数量,进而降低了算法空间复杂度及时间复杂度,极大的提高了检测准确度。
实施例二
请参阅图3,示出了本申请二之电子装置的硬件架构示意图。电子装置2包括,但不仅限于,可通过系统总线相互通信连接存储器21、处理22以及网络接口23,图2仅示出了具有组件21-23的电子装置2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
所述存储器21至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器21可以是所述电子装置2的内部存储单元,例如该电子装置2的硬盘或内存。在另一些实施例中,所述存储器也可以是所述电子装置2的外部存储设备,例如该电子装置2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括所述电子装置2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器21通常用于存储安装于所述电子装置2的操作系统和各类应用软件,例如目标检测系统20的程序代码等。此外,所述存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制所述电子装置2的总体操作。本实施例中,所述处理器22用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述目标检测系统20等。
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述电子装置2与其他电子设备之间建立通信连接。例如,所述 网络接口23用于通过网络将所述电子装置2与外部终端相连,在所述电子装置2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
实施例三
请参阅图4,示出了本申请三之目标检测系统的程序模块示意图。在本实施例中,目标检测系统20可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述目标检测方法。本申请所称的程序模块是指能够完成特定功能的一系列计算机可读指令段,比程序本身更适合于描述目标检测系统20在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:
获取模块201,用于获取待检测的图片。
具体实施例中,通过CT、核磁共振等方式获取具有脑胶质瘤的成像图片,并将该成像图片输入至电子装置2中,所述获取模块201获取该成像图片,例如:该成像图片大小为800*600。
提取模块202,用于将所述图片输入至改进型VGG16网络中进行图像特征提取。
在一较佳实施例中,在使用改进型VGG16网络之前,需建立模块206建立所述改进型VGG16网络。请参阅图1,示出了本申请之改进型VGG16网络的示意图。所述改进型VGG16网络包括5个卷积层、6个池化层及1个转 换卷积层,其中,所述5个卷积层之间均设置有1个池化层,第1个卷积层后面设置2个池化层,第5个卷积层后面设置所述转换卷积层,所述池化层为最大池化层,在每个卷积层之后还设置有非线性激活函数。具体地,所述获取模块201在获取到待检测的图片之后,所述提取模块202将所述图片首先经过5个卷积层以及4个池化层,以分别获取各层卷积图像特征,并将所述各层卷积图像特征存储于数据库中。然后,所述提取模块202将第一层卷积图像特征进行最大池化处理以获取第一图像特征,将第三层卷积图像特征设定为第二图像特征,其中所述第二图像特征为标准图像特征,将第五层卷积图像特征进行转换卷积处理以获取第三图像特征,其中,所述第三图像特征的大小与所述第一图像特征及所述第二图像特征的大小(宽及高)一致。然后,所述提取模块202将所述第一图像特征、所述标准图像特征及所述第二图像特征分别进行归一化处理,以分别获得第一归一图像、第二归一图像及第三归一图像,以使所述第一归一图像、所述第二归一图像及所述第三归一图像符合标准正态分布。最后,所述提取模块202将所述第一归一图像、所述第二归一图像及所述第三归一图像分别通过1*1*42的卷积核以调整通道数,并将将调整结果进行通道数堆叠操作。
示例性地,所述改进型VGG16网络包括5个卷积核大小为3,特征图填充圈数为1(也即kernel_size=3,pad=1)的卷积层,1个卷积核大小为2,卷积步长为2(也即kernel_size=2,stride=2)的池化层,1个卷积核大小为3,特征图填充圈数为1(也即kernel_size=3,pad=1)的转换卷积层,通过设置卷积层以使所述图像经过卷积层的大小不会发生改变,通过设置池化层的参数以使图像大小变为原来的二分之一。当输入的图片大小为800*600时,经过第一层卷积层(CONV1)后变化为800*600,经过第一个池化层(POOLING1)后变化为400*300,经过第二层卷积层(CONV2)后变化为400*300,经过第 二个池化层(POOLING2)后变化为200*150,经过第三层卷积层(CONV3)后变化为200*150,经过第三个池化层(POOLING3)后变化为100*75,经过第四层卷积层(CONV4)后变化为100*75,经过第四个池化层(POOLING4)后变化为50*38,经过第五层卷积层(CONV5)后变化为50*38。
然后,所述提取模块202选取第一、三及五层卷积层进行多尺度操作,将所述第三层卷积层(CONV3)200*150的图像特征设定为基准图像特征(也即第二图像特征)。将所述第一层卷积层(CONV1)800*600的图像特征输入至2个卷积核大小为2,卷积步长为2(也即kernel_size=2,stride=2)的池化层,以对所述200*150的图像特征进行最大池化处理,并获取第一图像特征。将所述第五层卷积层(CONV5)50*38的图像特征输入至2个转换卷积层中,以对所述50*38的图像特征进行转换卷积处理,并获取第三图像特征。经过处理后,所述第一图像特征、所述第二图像特征及所述第三图像特征的大小均为200*150大小。
然后,所述提取模块202将所述第一图像特征、所述第二图像特征及所述第三图像特征分别输入至批量归一化(Batch Normalization,BN)层,以使所述第一图像特征、所述第二图像特征及所述第三图像特征进行归一化处理,并分别获得第一归一图像、第二归一图像及第三归一图像,以使所述第一图像特征、所述第二图像特征及所述第三图像特征均符合标准正态分布。
最后,所述提取模块202将归一化处理后的所述第一归一图像、所述第二归一图像及所述第三归一图像分别通过1*1*42的卷积核以调整通道数,然后,将调整结果在第三维度(也即通道维度)上进行堆叠,以使通道数变成原来的3倍。例如:三张大小均为200*150的图像特征经1*1*42的卷积核之后,得到三张200*150*42的图像特征,然后,将所述三张200*150*42的图像特征在第三维度堆叠,得到200*150*126的图像特征。
池化模块203,用于将所述图像特征输入至ROI Pooling网络中进行池化。
需要说明的是,ROI Pooling网络只有池化操作,现有的ROI Pooling的池化方式有两种:第一种是采用SAME方式,即先填充0,将输入的图像特征变为长宽相同的图像,然后再进行池化操作;第二种是采用kernel size长宽不同的尺寸。在本申请中,所述池化模块203对输入的200*150*126的图像特征,采用4*3的kernel size,经过ROI Pooling池化,结果为13*13*126。
调整模块204,用于将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整。
具体地,经过ROI Pooling网络后的池化结果中图像特征为13*13*126,所述调整模块204将该13*13*126图像特征经过一个3*3*4的卷积核,以通过3*3*4卷积核做一个微调,调整后的结果为13*13*4的图像特征。所述3*3*4的卷积核增加了整个系统的鲁棒性,同时,4通道数有效的实现了降低维度的操作,大大减少了模型的参数量,进而降低了整个网络的时间复杂度。
分类模块205,用于将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
举例来说,所述分类模块205将微调整结果输入至RPN网络,然后,全连接层网络(3*3*63的卷积核)将微调整后的图像特征进行处理,根据处理结果中显示的热度来确定待检测图像中的目标为脑胶质瘤还是背景,及该目标的位置信息,其中显示较高热度的区域为脑胶质瘤,显示较低热度的区域为背景。
通过本申请,增加了系统的鲁棒性,有效的降低维度的操作,大大减少了参数量,进而降低了算法空间复杂度及时间复杂度,极大的提高了检测准确度。
本申请还提供一种计算机设备,如可以执行程序的智能手机、平板电脑、 笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备至少包括但不限于:可通过系统总线相互通信连接的存储器、处理器等。
本实施例还提供一种非易失性计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机可读指令,程序被处理器执行时实现相应功能。本实施例的非易失性计算机可读存储介质用于存储目标检测系统20,被处理器执行时实现如下步骤:
获取待检测的图片;
将所述图片输入至改进型VGG16网络中进行图像特征提取;
将所述图像特征输入至ROI Pooling网络中进行池化;
将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
上述本申请序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于深度学习的目标检测方法,包括步骤:
    获取待检测的图片;
    将所述图片输入至改进型VGG16网络中进行图像特征提取;
    将所述图像特征输入至ROI Pooling网络中进行池化;
    将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
    将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
  2. 如权利要求1所述的目标检测方法,所述将所述图片输入至改进型VGG16网络中进行图像特征提取的步骤之前,还包括步骤:
    建立所述改进型VGG16网络;
    其中,所述改进型VGG16网络包括5个卷积层、6个池化层及1个转换卷积层,其中,所述5个卷积层之间均设置有1个池化层,第1个卷积层后面设置2个池化层,第5个卷积层后面设置所述转换卷积层,所述池化层为最大池化层,在每个卷积层之后还设置有非线性激活函数。
  3. 如权利要求1所述的目标检测方法,所述将所述图片输入至改进型VGG16网络中进行图像特征提取的步骤,还包括步骤:
    将所述图片经过5个卷积层以及4个池化层,以分别获取各层卷积图像特征;
    将所述各层卷积图像特征存储于数据库中。
  4. 如权利要求3所述的目标检测方法,所述将所述各层卷积图像特征存储于数据库中的步骤之后,还包括步骤:
    将第一层卷积图像特征进行最大池化处理以获取第一图像特征;
    将第三层卷积图像特征设定为第二图像特征,其中,所述第二图像特征为标准图像特征;
    将第五层卷积图像特征进行转换卷积处理以获取第三图像特征,其中,所述第三图像特征的大小与所述第一图像特征及所述第二图像特征的大小一致,所述大小包括宽及高。
  5. 如权利要求4所述的目标检测方法,所述将第五层卷积图像特征进行转换卷积处理以获取第三图像特征的步骤之后,还包括步骤:
    将所述第一图像特征、所述标准图像特征及所述第二图像特征分别进行归一化处理,以分别获得第一归一图像、第二归一图像及第三归一图像,以使所述第一归一图像、所述第二归一图像及所述第三归一图像符合标准正态分布。
  6. 如权利要求5所述的目标检测方法,所述将所述第一图像特征、所述标准图像特征及所述第二图像特征分别进行归一化处理的步骤之后,还包括步骤:
    将所述第一归一图像、所述第二归一图像及所述第三归一图像分别通过1*1*42的卷积核以调整通道数。
  7. 如权利要求6所述的目标检测方法,其特征在于,所述将所述第一归一图像、所述第二归一图像及所述第三归一图像分别通过1*1*42的卷积核的步骤之后,还包括步骤:
    将调整结果进行通道数堆叠操作。
  8. 一种电子装置,包括:
    获取模块,用于获取待检测的图片;
    提取模块,用于将所述图片输入至改进型VGG16网络中进行图像特征提取;
    池化模块,用于将所述图像特征输入至ROI Pooling网络中进行池化;
    调整模块,用于将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
    分类模块,用于将微调整结果输入至RPN网络,后经过全连接层网络以 对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
  9. 如权利要求8所述的电子装置,还包括建立模块,用于:
    建立所述改进型VGG16网络;
    其中,所述改进型VGG16网络包括5个卷积层、6个池化层及1个转换卷积层,其中,所述5个卷积层之间均设置有1个池化层,第1个卷积层后面设置2个池化层,第5个卷积层后面设置所述转换卷积层,所述池化层为最大池化层,在每个卷积层之后还设置有非线性激活函数。
  10. 如权利要求8所述的电子装置,所述提取模块还用于:
    将所述图片经过5个卷积层以及4个池化层,以分别获取各层卷积图像特征;
    将所述各层卷积图像特征存储于数据库中。
  11. 如权利要求10所述的电子装置,所述提取模块还用于:
    将第一层卷积图像特征进行最大池化处理以获取第一图像特征;
    将第三层卷积图像特征设定为第二图像特征,其中,所述第二图像特征为标准图像特征;
    将第五层卷积图像特征进行转换卷积处理以获取第三图像特征,其中,所述第三图像特征的大小与所述第一图像特征及所述第二图像特征的大小一致,所述大小包括宽及高。
  12. 如权利要求11所述的电子装置,所述提取模块还用于:
    将所述第一图像特征、所述标准图像特征及所述第二图像特征分别进行归一化处理,以分别获得第一归一图像、第二归一图像及第三归一图像,以使所述第一归一图像、所述第二归一图像及所述第三归一图像符合标准正态分布。
  13. 如权利要求12所述的电子装置,所述提取模块还用于:
    将所述第一归一图像、所述第二归一图像及所述第三归一图像分别通过1*1*42的卷积核以调整通道数,并将调整结果进行通道数堆叠操作。
  14. 一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:
    获取待检测的图片;
    将所述图片输入至改进型VGG16网络中进行图像特征提取;
    将所述图像特征输入至ROI Pooling网络中进行池化;
    将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
    将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
  15. 如权利要求14所述的计算机设备,所述计算机可读指令被所述处理器执行时还实现以下步骤:
    建立所述改进型VGG16网络;
    其中,所述改进型VGG16网络包括5个卷积层、6个池化层及1个转换卷积层,其中,所述5个卷积层之间均设置有1个池化层,第1个卷积层后面设置2个池化层,第5个卷积层后面设置所述转换卷积层,所述池化层为最大池化层,在每个卷积层之后还设置有非线性激活函数。
  16. 如权利要求14所述的计算机设备,所述计算机可读指令被所述处理器执行时还实现以下步骤:
    将所述图片经过5个卷积层以及4个池化层,以分别获取各层卷积图像特征;
    将所述各层卷积图像特征存储于数据库中。
  17. 如权利要求16所述的计算机设备,所述计算机可读指令被所述处理器执行时还实现以下步骤:
    将第一层卷积图像特征进行最大池化处理以获取第一图像特征;
    将第三层卷积图像特征设定为第二图像特征,其中,所述第二图像特征为标准图像特征;
    将第五层卷积图像特征进行转换卷积处理以获取第三图像特征,其中,所述第三图像特征的大小与所述第一图像特征及所述第二图像特征的大小一致,所述大小包括宽及高。
  18. 如权利要求17所述的计算机设备,所述计算机可读指令被所述处理器执行时还实现以下步骤:
    将所述第一图像特征、所述标准图像特征及所述第二图像特征分别进行归一化处理,以分别获得第一归一图像、第二归一图像及第三归一图像,以使所述第一归一图像、所述第二归一图像及所述第三归一图像符合标准正态分布。
  19. 如权利要求18所述的计算机设备,所述计算机可读指令被所述处理器执行时还实现以下步骤:
    将所述第一归一图像、所述第二归一图像及所述第三归一图像分别通过1*1*42的卷积核以调整通道数。
  20. 一种非易失性计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时还实现以下步骤:
    获取待检测的图片;
    将所述图片输入至改进型VGG16网络中进行图像特征提取;
    将所述图像特征输入至ROI Pooling网络中进行池化;
    将池化结果输入至3*3*4的卷积核中以对所述池化结果进行微调整;及
    将微调整结果输入至RPN网络,后经过全连接层网络,以对目标及背景进行分类,以获取所述目标的类别信息及位置信息。
PCT/CN2019/102842 2019-07-03 2019-08-27 基于深度学习的目标检测方法及电子装置 WO2021000404A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910593114.4A CN110503088B (zh) 2019-07-03 2019-07-03 基于深度学习的目标检测方法及电子装置
CN201910593114.4 2019-07-03

Publications (1)

Publication Number Publication Date
WO2021000404A1 true WO2021000404A1 (zh) 2021-01-07

Family

ID=68585851

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/102842 WO2021000404A1 (zh) 2019-07-03 2019-08-27 基于深度学习的目标检测方法及电子装置

Country Status (2)

Country Link
CN (1) CN110503088B (zh)
WO (1) WO2021000404A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034455A (zh) * 2021-03-17 2021-06-25 清华大学深圳国际研究生院 一种平面物件麻点检测方法
CN114155676A (zh) * 2021-11-29 2022-03-08 山东中烟工业有限责任公司 一种物流系统破损木托盘检测报警系统及其工作方法
CN115018788A (zh) * 2022-06-02 2022-09-06 常州晋陵电力实业有限公司 基于智能机器人的架空线异常检测方法和系统
CN115937655A (zh) * 2023-02-24 2023-04-07 城云科技(中国)有限公司 多阶特征交互的目标检测模型及其构建方法、装置及应用

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523439B (zh) * 2020-04-21 2022-05-17 苏州浪潮智能科技有限公司 一种基于深度学习的目标检测的方法、系统、设备及介质
CN113393523B (zh) * 2021-06-04 2023-03-14 上海蓝色帛缔智能工程有限公司 一种自动化监控机房图像的方法、装置及电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610087A (zh) * 2017-05-15 2018-01-19 华南理工大学 一种基于深度学习的舌苔自动分割方法
CN108509978A (zh) * 2018-02-28 2018-09-07 中南大学 基于cnn的多级特征融合的多类目标检测方法及模型
CN109063559A (zh) * 2018-06-28 2018-12-21 东南大学 一种基于改良区域回归的行人检测方法
US10325179B1 (en) * 2019-01-23 2019-06-18 StradVision, Inc. Learning method and learning device for pooling ROI by using masking parameters to be used for mobile devices or compact networks via hardware optimization, and testing method and testing device using the same

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3043352A1 (en) * 2016-11-15 2018-05-24 Magic Leap, Inc. Deep learning system for cuboid detection
CN108664838A (zh) * 2017-03-27 2018-10-16 北京中科视维文化科技有限公司 基于改进rpn深度网络的端到端的监控场景行人检测方法
CN109858495B (zh) * 2019-01-16 2023-09-22 五邑大学 一种基于改进卷积块的特征提取方法、装置及其存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610087A (zh) * 2017-05-15 2018-01-19 华南理工大学 一种基于深度学习的舌苔自动分割方法
CN108509978A (zh) * 2018-02-28 2018-09-07 中南大学 基于cnn的多级特征融合的多类目标检测方法及模型
CN109063559A (zh) * 2018-06-28 2018-12-21 东南大学 一种基于改良区域回归的行人检测方法
US10325179B1 (en) * 2019-01-23 2019-06-18 StradVision, Inc. Learning method and learning device for pooling ROI by using masking parameters to be used for mobile devices or compact networks via hardware optimization, and testing method and testing device using the same

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113034455A (zh) * 2021-03-17 2021-06-25 清华大学深圳国际研究生院 一种平面物件麻点检测方法
CN113034455B (zh) * 2021-03-17 2023-01-10 清华大学深圳国际研究生院 一种平面物件麻点检测方法
CN114155676A (zh) * 2021-11-29 2022-03-08 山东中烟工业有限责任公司 一种物流系统破损木托盘检测报警系统及其工作方法
CN115018788A (zh) * 2022-06-02 2022-09-06 常州晋陵电力实业有限公司 基于智能机器人的架空线异常检测方法和系统
CN115018788B (zh) * 2022-06-02 2023-11-14 常州晋陵电力实业有限公司 基于智能机器人的架空线异常检测方法和系统
CN115937655A (zh) * 2023-02-24 2023-04-07 城云科技(中国)有限公司 多阶特征交互的目标检测模型及其构建方法、装置及应用

Also Published As

Publication number Publication date
CN110503088A (zh) 2019-11-26
CN110503088B (zh) 2024-05-07

Similar Documents

Publication Publication Date Title
WO2021000404A1 (zh) 基于深度学习的目标检测方法及电子装置
US9100630B2 (en) Object detection metadata
WO2019223147A1 (zh) 肝脏癌变定位方法、装置及存储介质
WO2020042754A1 (zh) 一种全息防伪码校验方法及装置
CN110688891A (zh) 采用3d批归一化的三维(3d)卷积
KR102277087B1 (ko) 콘텐츠 분류 방법 및 전자 장치
WO2021051547A1 (zh) 暴力行为检测方法及系统
CN110276408B (zh) 3d图像的分类方法、装置、设备及存储介质
US9886766B2 (en) Electronic device and method for adding data to image and extracting added data from image
WO2021174940A1 (zh) 人脸检测方法与系统
WO2021169126A1 (zh) 病灶分类模型训练方法、装置、计算机设备和存储介质
WO2021174941A1 (zh) 人体属性识别方法、系统、计算机设备及存储介质
WO2021189856A1 (zh) 证件校验方法、装置、电子设备及介质
WO2021057148A1 (zh) 基于神经网络的脑组织分层方法、装置、计算机设备
WO2020143302A1 (zh) 卷积神经网络模型优化方法、装置、计算机设备及存储介质
US20210303827A1 (en) Face feature point detection method and device, equipment and storage medium
WO2019217562A1 (en) Aggregated image annotation
US11010893B2 (en) Image identifying method and image identifying device
Alizadeh et al. A mobile application for early detection of melanoma by image processing algorithms
CN112036316B (zh) 手指静脉识别方法、装置、电子设备及可读存储介质
US9898799B2 (en) Method for image processing and electronic device supporting thereof
KR20210021663A (ko) 의료 영상 비식별화를 위한 신체 부위 위치 특정 방법, 프로그램 및 컴퓨팅 장치
US20220405916A1 (en) Method for detecting the presence of pneumonia area in medical images of patients, detecting system, and electronic device employing method
US11893068B2 (en) Electronic device and control method thereof
WO2022051479A1 (en) Quantitative imaging biomarker for lung cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19935982

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19935982

Country of ref document: EP

Kind code of ref document: A1