CN111292377B - Target detection method, device, computer equipment and storage medium - Google Patents

Target detection method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN111292377B
CN111292377B CN202010166856.1A CN202010166856A CN111292377B CN 111292377 B CN111292377 B CN 111292377B CN 202010166856 A CN202010166856 A CN 202010166856A CN 111292377 B CN111292377 B CN 111292377B
Authority
CN
China
Prior art keywords
feature
feature map
target
dimension
different scales
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010166856.1A
Other languages
Chinese (zh)
Other versions
CN111292377A (en
Inventor
赵博睿
魏秀参
陈钊民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuzhou Kuangshi Data Technology Co ltd
Nanjing Kuangyun Technology Co ltd
Beijing Megvii Technology Co Ltd
Original Assignee
Xuzhou Kuangshi Data Technology Co ltd
Nanjing Kuangyun Technology Co ltd
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuzhou Kuangshi Data Technology Co ltd, Nanjing Kuangyun Technology Co ltd, Beijing Megvii Technology Co Ltd filed Critical Xuzhou Kuangshi Data Technology Co ltd
Priority to CN202010166856.1A priority Critical patent/CN111292377B/en
Publication of CN111292377A publication Critical patent/CN111292377A/en
Application granted granted Critical
Publication of CN111292377B publication Critical patent/CN111292377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a target detection method, a target detection device, a computer device and a storage medium. Wherein the method comprises the following steps: carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales; determining target feature graphs with spatial resolution meeting preset conditions in the feature graphs with different scales; performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map; inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; and performing target detection according to the feature images and the hierarchical information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected. According to the method and the device, the hierarchical information features are added in the target detection process, so that the classification features can be effectively improved, and the accuracy of target detection is improved.

Description

Target detection method, device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of image data processing technologies, and in particular, to a target detection method, apparatus, computer device, and storage medium.
Background
With the development of artificial intelligence technology, target detection is one of the most basic and primary tasks in computer vision, and is widely applied to various aspects of industry and daily life, such as the fields of automatic driving, security monitoring, game entertainment and the like.
In the prior art, the target detection method predicts the boundary frame through the convolutional neural network, and then fine-tunes the boundary frame through the neural network once to further improve the quality of the boundary frame, thereby improving the accuracy of the boundary frame.
However, with the conventional target detection method, accuracy is low when detecting targets in a bounding box.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a target detection method, apparatus, computer device, and storage medium capable of improving the accuracy of target detection.
A method of target detection, the method comprising:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
In one embodiment, performing dimension reduction on the target feature map to obtain a dimension reduced target feature map, where the dimension reduced target feature map includes:
carrying out global maximum pooling on the target feature map to obtain a first pooling feature;
carrying out global average pooling on the target feature map to obtain a second pooling feature;
and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
In one embodiment, performing object detection according to the feature maps of the multiple different scales and the hierarchical information features to obtain category information and position information of the object in the image to be detected, including:
obtaining a hierarchical information feature map according to the feature maps of the multiple different scales and the hierarchical information features;
determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the image to be detected;
and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
In one embodiment, obtaining a hierarchical information feature map according to the feature maps of the multiple different scales and the hierarchical information features includes:
and fusing the feature graphs with different scales with the hierarchical information features to obtain a hierarchical information feature graph.
In one embodiment, the hierarchical information features include hierarchical categories.
In one embodiment, inputting the dimension reduced target feature map into a fully connected network to obtain a hierarchical information feature, including:
performing full connection on the dimension-reduced target feature map to obtain a multi-dimension vector;
determining a confidence score for each category according to the elements in the multi-dimensional vector;
and selecting a category with confidence score meeting a preset condition, and determining the category as the hierarchy category.
In one embodiment, determining the target feature map with spatial resolution meeting the preset condition in the feature maps with different scales includes:
and selecting the feature map with the minimum spatial resolution among the feature maps with different scales, and determining the feature map as a target feature map.
An object detection apparatus, the apparatus comprising:
the feature map extraction module is used for carrying out multi-scale feature extraction on the image to be detected to obtain a plurality of feature maps with different scales;
the feature map determining module is used for determining target feature maps, the spatial resolution of which meets preset conditions, of the feature maps with different scales;
the feature map dimension reduction module is used for reducing the dimension of the target feature map to obtain a dimension-reduced target feature map;
the feature map input module is used for inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and the target detection module is used for carrying out target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
According to the target detection method, the device, the computer equipment and the storage medium, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into the fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
Drawings
FIG. 1 is a diagram of an application environment for a target detection method in one embodiment;
FIG. 2 is a flow chart of a method of detecting targets in one embodiment;
FIG. 3 is a flow diagram of a complementary scheme for dimension reduction of a target feature map in one embodiment;
FIG. 4 is a flow diagram of a complementary approach to object detection based on feature maps and hierarchical information features of multiple different scales in one embodiment;
FIG. 5 is a flow chart of a complementary scheme for inputting the reduced dimension target feature map into a fully connected network in one embodiment;
FIG. 6 is a block diagram of an object detection device in one embodiment;
FIG. 7 is an internal block diagram of a computer device in one embodiment;
fig. 8 is an internal structural view of a computer device in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The target detection method provided by the application can be applied to an application environment shown in fig. 1. Wherein a hierarchical classification model 10 is added at the highest level of the feature pyramid network 20. In this hierarchical classification model 10, a pooling network 102, a fully connected network (FC) 104, and a Loss function (Loss) 106 are included. Optionally, the pooling network 102 includes a global average pooling layer (GAP) and a global maximum pooling layer (GMP). The features extracted by the feature pyramid network 20 are input into the R-CNN network to obtain a target detection result.
In an exemplary embodiment, the application environment shown in fig. 1 may be provided in a terminal, and it may be understood that the application environment may also be provided in a server, and may also be provided in a system including the terminal and the server, and implemented through interaction between the terminal and the server.
In an exemplary embodiment, as shown in fig. 2, there is provided a target detection method, which may be specifically implemented by the following steps:
step S202, multi-scale feature extraction is carried out on an image to be detected, and a plurality of feature images with different scales are obtained.
Specifically, firstly, an image to be detected is acquired, and is input into a convolutional neural network, so that the image to be detected is subjected to multiple convolution operations through the convolutional neural network to realize multi-scale feature extraction, and a plurality of feature images with different scales are obtained. The feature graphs with different scales can form a feature pyramid after being ordered according to the size of the scales. In the feature pyramid, feature graphs at a lower layer have rich detailed information, and feature graphs at a higher layer have rich semantic information. It will be appreciated that the higher the hierarchy of the feature map, the more rich the semantic information it contains. Whereas for spatial resolution, the higher the hierarchy of the feature map, the smaller its spatial resolution. It is not difficult to deduce that the spatial resolution of the feature map at the highest level is the smallest.
Step S204, determining a plurality of target feature maps with different scales, wherein the spatial resolution of the target feature maps meets the preset condition.
Specifically, the spatial resolution is taken as a selection parameter of the feature map, a preset condition is established, and after a plurality of feature maps with different scales are obtained, only the feature map with the spatial resolution meeting the preset condition is taken as a target feature map to carry out subsequent related processing on the feature map. For example, a feature map with the smallest spatial resolution may be regarded as the target feature map, or a feature map with the next smallest spatial resolution may be regarded as the target feature map, or the like.
And S206, performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map.
Specifically, after determining the target feature map, dimension reduction is performed on the target feature map, and the dimension reduced target feature map is obtained. Optionally, the target feature map may be input into a pooling network, and the pooling operation may be performed on the target feature map through the pooling network to achieve dimension reduction. Optionally, the pooling network includes a global maximum pooling layer and/or a global average pooling layer, and then the dimension reduction method specifically may be: and respectively inputting the target feature images into a global maximum pooling layer and/or a global average pooling layer, and carrying out corresponding pooling operation on the target feature images through the global maximum pooling layer and/or the global average pooling layer to achieve the purpose of reducing the dimension.
And step S208, inputting the target feature map after dimension reduction into a fully-connected network to obtain the hierarchical information feature.
Specifically, after the dimension-reduced target feature map is obtained, the dimension-reduced target feature map is input into a fully-connected network, and the dimension-reduced target feature map is processed through the fully-connected network to obtain the hierarchical information feature.
The hierarchical information features are used for realizing hierarchical classification, and the hierarchical information features can be represented by vectors. Optionally, the hierarchical information features include hierarchical categories.
Step S210, performing target detection according to the feature graphs and the level information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected.
Specifically, after the hierarchical information features are obtained, target detection is performed according to a plurality of feature maps with different scales and the hierarchical information features, so that category information and position information of targets in the image to be detected are obtained. Optionally, the feature map and the level information feature of multiple different scales may be input into the target detection model together, for example, may be R-CNN (Region-CNN), and the category information and the position information of the target in the image to be detected are obtained by predicting the feature map and the level information feature of multiple different scales through the R-CNN.
In the target detection method, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into a fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
In an exemplary embodiment, the dimension reduction is performed on the target feature map, so as to obtain a possible implementation process of the dimension-reduced target feature map. On the basis of the above embodiment, as shown in fig. 3, step S206 may be specifically implemented by the following steps:
in step S2062, the target feature map is subjected to global maximum pooling, and the first pooled feature is obtained.
The global max pooling refers to taking the maximum value of a two-dimensional matrix of each channel for a certain feature map (feature map), and the maximum value is taken as the maximum information of the channel. Equivalent to representing a two-dimensional channel with a maximum value.
Specifically, inputting the target feature map into a global maximum pooling layer for global maximum pooling to obtain a first pooling feature. Wherein the dimension of the first pooled feature is lower than the dimension of the target feature map.
Step S2064, performing global average pooling on the target feature map to obtain a second pooled feature.
The global averaging pooling refers to taking an average value of a two-dimensional matrix of each channel for a certain feature map (feature map) as average information of the channel. Equivalent to representing a two-dimensional channel with an average value.
Specifically, the target feature map is input into a global average pooling layer to carry out global average pooling, and second pooling features are obtained. Wherein the dimension of the second pooled feature is lower than the dimension of the target feature map.
And step S2066, adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
Specifically, after the first pooling feature and the second pooling feature are obtained, adding or splicing the first pooling feature and the second pooling feature to obtain the target feature map after dimension reduction. Optionally, the first pooled feature is the same as the second pooled feature in its dimension, such that after addition or stitching, a target feature map of that dimension is obtained.
In the embodiment of the application, the global maximum pooling and the global average pooling are respectively carried out on the target feature map, so that the feature dimension can be reduced on one hand, and the background information and the texture information of the image can be reserved more on the other hand for the obtained target feature map, thereby being beneficial to improving the accuracy of target detection.
In an exemplary embodiment, the object detection is performed according to feature graphs and hierarchical information features of a plurality of different scales, so as to obtain a possible implementation process of category information and position information of an object in an image to be detected. On the basis of the above embodiment, as shown in fig. 4, step S210 may be specifically implemented by:
step S2102, obtaining a hierarchical information feature map according to a plurality of feature maps with different scales and hierarchical information features;
step S2104, determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the image to be detected;
step S2106, inputting the characteristics of the target in the image to be detected into the fully connected network to obtain the category information and the position information of the target in the image to be detected.
Specifically, after a plurality of feature maps and hierarchical information features with different scales are obtained, the feature maps and the hierarchical information features with different scales are fused to obtain a hierarchical information feature map. And then, carrying out region-of-interest pooling operation based on candidate regions according to the hierarchical information feature map, extracting corresponding features of the target in the image to be detected, and then processing the target features through a fully-connected network to judge the type and the position of the target, so as to finish target detection and obtain the type information and the position information of the target in the image to be detected. Alternatively, the hierarchical information feature map may be input into a target detection model, which may be, for example, R-CNN (Region-CNN), by which category information and position information of a target in an image to be detected are predicted.
More specifically, a specific implementation manner of fusing the feature graphs and the hierarchical information features of a plurality of different scales may be: firstly, fusing the hierarchical information features with the target feature images, and then fusing the fusion result with the feature images of the next layer, namely sequentially fusing the feature images of each layer in a top-down mode. The fusion result of all layers can be called as a hierarchical information feature map, and the fusion result of each layer can be called as a hierarchical information feature map and can be set according to actual requirements.
In the embodiment of the application, the feature map with the level information is obtained by fusing the feature maps with different scales and the level information features, and the classification features can be effectively improved by using the feature map, so that the accuracy of target detection is improved.
In an exemplary embodiment, the method involves a possible implementation process of fully connecting the dimension reduction features to obtain the hierarchical information features. Taking the hierarchical category as an example, based on the above embodiment, as shown in fig. 5, step S208 may be specifically implemented by the following steps:
step S2082, performing full connection on the target feature map after dimension reduction to obtain a multi-dimension vector;
step S2084, determining the confidence score of each category according to the elements in the multidimensional vector;
step S2086, selecting a category with confidence score meeting a preset condition, and determining the category as a hierarchy category.
Specifically, assuming n categories to be predicted, inputting the target feature map after the dimension reduction into a fully connected network, and fully connecting the target feature map after the dimension reduction through the fully connected network to obtain an n-dimensional vector. Each element in the n-dimensional vector represents the confidence score of the corresponding category, so that the confidence score of each category is determined according to the elements in the multi-dimensional vector, and the category with the confidence score meeting the preset condition is selected from all the confidence scores to be determined as the hierarchical category. Alternatively, the category with the highest confidence score may be selected as the hierarchical category. For example, assuming that the image to be detected contains a giraffe, the predicted hierarchical category is an animal. Assuming that the image to be detected contains a bus, the predicted hierarchical category is a vehicle.
In the embodiment of the application, the accuracy of the hierarchical category can be ensured by fully connecting the dimension-reduced target feature graphs and classifying the dimension-reduced target feature graphs based on the confidence score to determine the hierarchical category, so that the accuracy of target detection is improved.
In an exemplary embodiment, a training process for hierarchical classification models is involved. Specifically, the training process includes: first, an image sample is acquired. And then, carrying out multi-scale feature extraction on the image sample through a convolutional neural network to obtain a plurality of feature map samples with different scales. And then, reducing the dimension of the target feature pattern book through a pooling network in the hierarchical classification model to obtain a dimension-reduced target feature pattern sample. And then, processing the target feature map sample after the dimension reduction through a fully connected network in the hierarchical classification model to obtain a hierarchical prediction result. And calculating the hierarchical classification loss through the loss function and the hierarchical prediction result, and adjusting model parameters through the hierarchical classification loss to obtain a trained hierarchical classification model.
In the embodiment of the application, the supervision signal is directly added on the basis of the original characteristic, which is equivalent to the regularization function on the characteristic, so that the characteristic can be relearned towards the activating direction of the hierarchy during training, and the characteristic learning can be effectively improved.
The following further demonstrates the advantages of the technical solutions of the present application by listing some experimental data, see table 1:
Method lr sched dataset mmAP
R50-FPN 1 COCO17 36.3
R50-FPN-MLL 1 COCO17 36.8
TABLE 1
As can be seen from Table 1, for the same data set COCO17, the accuracy of the technical scheme of the application reaches 36.8, and compared with the existing scheme, the technical scheme can reach 36.3, and has a lifting effect.
It should be understood that, although the steps in the flowcharts of fig. 2-5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-5 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.
In an exemplary embodiment, as shown in fig. 6, there is provided an object detection apparatus 30 including: a feature map extraction module 302, a feature map determination module 304, a feature map dimension reduction module 306, a feature map input module 308, and a target detection module 310, wherein:
the feature map extracting module 302 is configured to perform multi-scale feature extraction on an image to be detected, so as to obtain feature maps with a plurality of different scales.
The feature map determining module 304 is configured to determine target feature maps with spatial resolutions meeting preset conditions in a plurality of feature maps with different scales.
The feature map dimension reduction module 306 is configured to reduce dimensions of the target feature map to obtain a dimension-reduced target feature map.
The feature map input module 308 is configured to input the dimension-reduced target feature map into a fully-connected network to obtain a hierarchical information feature.
The object detection module 310 is configured to perform object detection according to a plurality of feature maps and hierarchical information features with different scales, so as to obtain category information and position information of an object in an image to be detected.
In the target detection device, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into the fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
In an exemplary embodiment, the feature map dimension reduction module 306 is specifically configured to perform global maximum pooling on the target feature map to obtain a first pooled feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
In an exemplary embodiment, the object detection module 310 is specifically configured to obtain a hierarchical information feature map according to a plurality of feature maps and hierarchical information features of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
In an exemplary embodiment, the object detection module 310 is specifically configured to fuse a plurality of feature maps with different scales and hierarchical information features to obtain a hierarchical information feature map.
In an exemplary embodiment, the hierarchical information features include hierarchical categories.
In an exemplary embodiment, the feature map input module 308 is specifically configured to fully connect the dimension-reduced target feature map to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.
In an exemplary embodiment, the feature map determining module 304 is specifically configured to select a feature map with a minimum spatial resolution of a plurality of feature maps with different scales, and determine the feature map as the target feature map.
For specific limitations of the object detection device, reference may be made to the above limitations of the object detection method, and no further description is given here. The respective modules in the above-described object detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In an exemplary embodiment, a computer device is provided, which may be a server, and an internal structure thereof may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object detection.
In an exemplary embodiment, a computer device, which may be a terminal, is provided, and an internal structure diagram thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of object detection. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 7 and 8 are block diagrams of only some of the structures associated with the aspects of the present application and are not intended to limit the computer device to which the aspects of the present application may be applied, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or may have a different arrangement of components.
In an exemplary embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor, when executing the computer program, performing the steps of:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs with spatial resolution meeting preset conditions in the feature graphs with different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature images and the hierarchical information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected.
In the computer equipment, firstly, the dimension of the extracted feature map is reduced, then the feature map after dimension reduction is input into a fully-connected network to obtain the hierarchical information feature, and finally, the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: carrying out global maximum pooling on the target feature map to obtain a first pooling feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: obtaining a hierarchical information feature map according to the feature maps and the hierarchical information features of a plurality of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: and fusing the feature graphs with different scales with the hierarchical information features to obtain the hierarchical information feature graph.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: fully connecting the dimension-reduced target feature graphs to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: and selecting a plurality of feature images with different scales and the feature image with the minimum spatial resolution, and determining the feature image as a target feature image.
In an exemplary embodiment, a computer readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs with spatial resolution meeting preset conditions in the feature graphs with different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature images and the hierarchical information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected.
In the computer readable storage medium, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into a fully connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: carrying out global maximum pooling on the target feature map to obtain a first pooling feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: obtaining a hierarchical information feature map according to the feature maps and the hierarchical information features of a plurality of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: and fusing the feature graphs with different scales with the hierarchical information features to obtain the hierarchical information feature graph.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: fully connecting the dimension-reduced target feature graphs to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: and selecting a plurality of feature images with different scales and the feature image with the minimum spatial resolution, and determining the feature image as a target feature image.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (10)

1. A method of target detection, the method comprising:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; the hierarchical information features include hierarchical categories;
and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
2. The method of claim 1, wherein dimension reduction is performed on the target feature map to obtain a dimension reduced target feature map, comprising:
carrying out global maximum pooling on the target feature map to obtain a first pooling feature;
carrying out global average pooling on the target feature map to obtain a second pooling feature;
and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
3. The method according to claim 1, wherein performing object detection according to the feature maps of the multiple different scales and the hierarchical information features to obtain category information and position information of the object in the image to be detected includes:
obtaining a hierarchical information feature map according to the feature maps of the multiple different scales and the hierarchical information features;
determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the image to be detected;
and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
4. A method according to claim 3, wherein deriving a hierarchical information feature map from the plurality of feature maps of different scales and the hierarchical information feature comprises:
and fusing the feature graphs with different scales with the hierarchical information features to obtain a hierarchical information feature graph.
5. The method of claim 1, wherein inputting the reduced-dimension target feature map into a fully-connected network to obtain a hierarchical information feature comprises:
performing full connection on the dimension-reduced target feature map to obtain a multi-dimension vector;
determining a confidence score for each category according to the elements in the multi-dimensional vector;
and selecting a category with confidence score meeting a preset condition, and determining the category as the hierarchy category.
6. The method of claim 1, wherein determining a target feature map for which spatial resolution meets a preset condition for the plurality of feature maps of different scales comprises:
and selecting the feature map with the minimum spatial resolution among the feature maps with different scales, and determining the feature map as a target feature map.
7. An object detection device, the device comprising:
the feature map extraction module is used for carrying out multi-scale feature extraction on the image to be detected to obtain a plurality of feature maps with different scales;
the feature map determining module is used for determining target feature maps, the spatial resolution of which meets preset conditions, of the feature maps with different scales;
the feature map dimension reduction module is used for reducing the dimension of the target feature map to obtain a dimension-reduced target feature map;
the feature map input module is used for inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; the hierarchical information features include hierarchical categories;
and the target detection module is used for carrying out target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
8. The apparatus of claim 7, wherein the feature map dimension reduction module is specifically configured to perform global maximum pooling on the target feature map to obtain a first pooled feature; carrying out global average pooling on the target feature map to obtain a second pooling feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202010166856.1A 2020-03-11 2020-03-11 Target detection method, device, computer equipment and storage medium Active CN111292377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010166856.1A CN111292377B (en) 2020-03-11 2020-03-11 Target detection method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010166856.1A CN111292377B (en) 2020-03-11 2020-03-11 Target detection method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111292377A CN111292377A (en) 2020-06-16
CN111292377B true CN111292377B (en) 2024-01-23

Family

ID=71022977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010166856.1A Active CN111292377B (en) 2020-03-11 2020-03-11 Target detection method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111292377B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111768392B (en) * 2020-06-30 2022-10-14 创新奇智(广州)科技有限公司 Target detection method and device, electronic equipment and storage medium
CN111814905A (en) * 2020-07-23 2020-10-23 上海眼控科技股份有限公司 Target detection method, target detection device, computer equipment and storage medium
CN111881996A (en) * 2020-08-03 2020-11-03 上海眼控科技股份有限公司 Object detection method, computer device and storage medium
CN114066818B (en) * 2021-10-23 2023-04-07 广州市艾贝泰生物科技有限公司 Cell detection analysis method, cell detection analysis device, computer equipment and storage medium
CN114359340A (en) * 2021-12-27 2022-04-15 中国电信股份有限公司 Tracking method, device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034054A (en) * 2018-07-24 2018-12-18 华北电力大学 Harmonic wave multi-tag classification method based on LSTM
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
WO2019101021A1 (en) * 2017-11-23 2019-05-31 腾讯科技(深圳)有限公司 Image recognition method, apparatus, and electronic device
CN109886871A (en) * 2019-01-07 2019-06-14 国家新闻出版广电总局广播科学研究院 The image super-resolution method merged based on channel attention mechanism and multilayer feature

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019101021A1 (en) * 2017-11-23 2019-05-31 腾讯科技(深圳)有限公司 Image recognition method, apparatus, and electronic device
CN109034054A (en) * 2018-07-24 2018-12-18 华北电力大学 Harmonic wave multi-tag classification method based on LSTM
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109886871A (en) * 2019-01-07 2019-06-14 国家新闻出版广电总局广播科学研究院 The image super-resolution method merged based on channel attention mechanism and multilayer feature

Also Published As

Publication number Publication date
CN111292377A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
CN111292377B (en) Target detection method, device, computer equipment and storage medium
WO2020228446A1 (en) Model training method and apparatus, and terminal and storage medium
CN110852349B (en) Image processing method, detection method, related equipment and storage medium
KR101896357B1 (en) Method, device and program for detecting an object
KR102140805B1 (en) Neural network learning method and apparatus for object detection of satellite images
CN113239818B (en) Table cross-modal information extraction method based on segmentation and graph convolution neural network
CN110610143B (en) Crowd counting network method, system, medium and terminal for multi-task combined training
CN111310800B (en) Image classification model generation method, device, computer equipment and storage medium
CN112101386B (en) Text detection method, device, computer equipment and storage medium
CN108334805A (en) The method and apparatus for detecting file reading sequences
JP2010157118A (en) Pattern identification device and learning method for the same and computer program
CN113034514A (en) Sky region segmentation method and device, computer equipment and storage medium
CN116665054A (en) Remote sensing image small target detection method based on improved YOLOv3
CN113012189A (en) Image recognition method and device, computer equipment and storage medium
CN111382638A (en) Image detection method, device, equipment and storage medium
CN116630630B (en) Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium
CN117710728A (en) SAR image target recognition method, SAR image target recognition device, SAR image target recognition computer equipment and storage medium
CN117115824A (en) Visual text detection method based on stroke region segmentation strategy
CN114677578B (en) Method and device for determining training sample data
CN114445716B (en) Key point detection method, key point detection device, computer device, medium, and program product
CN116524296A (en) Training method and device of equipment defect detection model and equipment defect detection method
CN116310899A (en) YOLOv 5-based improved target detection method and device and training method
CN112509052B (en) Method, device, computer equipment and storage medium for detecting macula fovea
CN112862002A (en) Training method of multi-scale target detection model, target detection method and device
CN113743445A (en) Target object identification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant