CN111292377B - Target detection method, device, computer equipment and storage medium - Google Patents
Target detection method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN111292377B CN111292377B CN202010166856.1A CN202010166856A CN111292377B CN 111292377 B CN111292377 B CN 111292377B CN 202010166856 A CN202010166856 A CN 202010166856A CN 111292377 B CN111292377 B CN 111292377B
- Authority
- CN
- China
- Prior art keywords
- feature
- feature map
- target
- dimension
- different scales
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 73
- 230000009467 reduction Effects 0.000 claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 32
- 238000000605 extraction Methods 0.000 claims abstract description 16
- 238000011176 pooling Methods 0.000 claims description 72
- 238000004590 computer program Methods 0.000 claims description 27
- 239000013598 vector Substances 0.000 claims description 16
- 230000008569 process Effects 0.000 abstract description 11
- 238000013527 convolutional neural network Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000013145 classification model Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000282816 Giraffa camelopardalis Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application relates to a target detection method, a target detection device, a computer device and a storage medium. Wherein the method comprises the following steps: carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales; determining target feature graphs with spatial resolution meeting preset conditions in the feature graphs with different scales; performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map; inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; and performing target detection according to the feature images and the hierarchical information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected. According to the method and the device, the hierarchical information features are added in the target detection process, so that the classification features can be effectively improved, and the accuracy of target detection is improved.
Description
Technical Field
The present disclosure relates to the field of image data processing technologies, and in particular, to a target detection method, apparatus, computer device, and storage medium.
Background
With the development of artificial intelligence technology, target detection is one of the most basic and primary tasks in computer vision, and is widely applied to various aspects of industry and daily life, such as the fields of automatic driving, security monitoring, game entertainment and the like.
In the prior art, the target detection method predicts the boundary frame through the convolutional neural network, and then fine-tunes the boundary frame through the neural network once to further improve the quality of the boundary frame, thereby improving the accuracy of the boundary frame.
However, with the conventional target detection method, accuracy is low when detecting targets in a bounding box.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a target detection method, apparatus, computer device, and storage medium capable of improving the accuracy of target detection.
A method of target detection, the method comprising:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
In one embodiment, performing dimension reduction on the target feature map to obtain a dimension reduced target feature map, where the dimension reduced target feature map includes:
carrying out global maximum pooling on the target feature map to obtain a first pooling feature;
carrying out global average pooling on the target feature map to obtain a second pooling feature;
and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
In one embodiment, performing object detection according to the feature maps of the multiple different scales and the hierarchical information features to obtain category information and position information of the object in the image to be detected, including:
obtaining a hierarchical information feature map according to the feature maps of the multiple different scales and the hierarchical information features;
determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the image to be detected;
and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
In one embodiment, obtaining a hierarchical information feature map according to the feature maps of the multiple different scales and the hierarchical information features includes:
and fusing the feature graphs with different scales with the hierarchical information features to obtain a hierarchical information feature graph.
In one embodiment, the hierarchical information features include hierarchical categories.
In one embodiment, inputting the dimension reduced target feature map into a fully connected network to obtain a hierarchical information feature, including:
performing full connection on the dimension-reduced target feature map to obtain a multi-dimension vector;
determining a confidence score for each category according to the elements in the multi-dimensional vector;
and selecting a category with confidence score meeting a preset condition, and determining the category as the hierarchy category.
In one embodiment, determining the target feature map with spatial resolution meeting the preset condition in the feature maps with different scales includes:
and selecting the feature map with the minimum spatial resolution among the feature maps with different scales, and determining the feature map as a target feature map.
An object detection apparatus, the apparatus comprising:
the feature map extraction module is used for carrying out multi-scale feature extraction on the image to be detected to obtain a plurality of feature maps with different scales;
the feature map determining module is used for determining target feature maps, the spatial resolution of which meets preset conditions, of the feature maps with different scales;
the feature map dimension reduction module is used for reducing the dimension of the target feature map to obtain a dimension-reduced target feature map;
the feature map input module is used for inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and the target detection module is used for carrying out target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
According to the target detection method, the device, the computer equipment and the storage medium, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into the fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
Drawings
FIG. 1 is a diagram of an application environment for a target detection method in one embodiment;
FIG. 2 is a flow chart of a method of detecting targets in one embodiment;
FIG. 3 is a flow diagram of a complementary scheme for dimension reduction of a target feature map in one embodiment;
FIG. 4 is a flow diagram of a complementary approach to object detection based on feature maps and hierarchical information features of multiple different scales in one embodiment;
FIG. 5 is a flow chart of a complementary scheme for inputting the reduced dimension target feature map into a fully connected network in one embodiment;
FIG. 6 is a block diagram of an object detection device in one embodiment;
FIG. 7 is an internal block diagram of a computer device in one embodiment;
fig. 8 is an internal structural view of a computer device in another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The target detection method provided by the application can be applied to an application environment shown in fig. 1. Wherein a hierarchical classification model 10 is added at the highest level of the feature pyramid network 20. In this hierarchical classification model 10, a pooling network 102, a fully connected network (FC) 104, and a Loss function (Loss) 106 are included. Optionally, the pooling network 102 includes a global average pooling layer (GAP) and a global maximum pooling layer (GMP). The features extracted by the feature pyramid network 20 are input into the R-CNN network to obtain a target detection result.
In an exemplary embodiment, the application environment shown in fig. 1 may be provided in a terminal, and it may be understood that the application environment may also be provided in a server, and may also be provided in a system including the terminal and the server, and implemented through interaction between the terminal and the server.
In an exemplary embodiment, as shown in fig. 2, there is provided a target detection method, which may be specifically implemented by the following steps:
step S202, multi-scale feature extraction is carried out on an image to be detected, and a plurality of feature images with different scales are obtained.
Specifically, firstly, an image to be detected is acquired, and is input into a convolutional neural network, so that the image to be detected is subjected to multiple convolution operations through the convolutional neural network to realize multi-scale feature extraction, and a plurality of feature images with different scales are obtained. The feature graphs with different scales can form a feature pyramid after being ordered according to the size of the scales. In the feature pyramid, feature graphs at a lower layer have rich detailed information, and feature graphs at a higher layer have rich semantic information. It will be appreciated that the higher the hierarchy of the feature map, the more rich the semantic information it contains. Whereas for spatial resolution, the higher the hierarchy of the feature map, the smaller its spatial resolution. It is not difficult to deduce that the spatial resolution of the feature map at the highest level is the smallest.
Step S204, determining a plurality of target feature maps with different scales, wherein the spatial resolution of the target feature maps meets the preset condition.
Specifically, the spatial resolution is taken as a selection parameter of the feature map, a preset condition is established, and after a plurality of feature maps with different scales are obtained, only the feature map with the spatial resolution meeting the preset condition is taken as a target feature map to carry out subsequent related processing on the feature map. For example, a feature map with the smallest spatial resolution may be regarded as the target feature map, or a feature map with the next smallest spatial resolution may be regarded as the target feature map, or the like.
And S206, performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map.
Specifically, after determining the target feature map, dimension reduction is performed on the target feature map, and the dimension reduced target feature map is obtained. Optionally, the target feature map may be input into a pooling network, and the pooling operation may be performed on the target feature map through the pooling network to achieve dimension reduction. Optionally, the pooling network includes a global maximum pooling layer and/or a global average pooling layer, and then the dimension reduction method specifically may be: and respectively inputting the target feature images into a global maximum pooling layer and/or a global average pooling layer, and carrying out corresponding pooling operation on the target feature images through the global maximum pooling layer and/or the global average pooling layer to achieve the purpose of reducing the dimension.
And step S208, inputting the target feature map after dimension reduction into a fully-connected network to obtain the hierarchical information feature.
Specifically, after the dimension-reduced target feature map is obtained, the dimension-reduced target feature map is input into a fully-connected network, and the dimension-reduced target feature map is processed through the fully-connected network to obtain the hierarchical information feature.
The hierarchical information features are used for realizing hierarchical classification, and the hierarchical information features can be represented by vectors. Optionally, the hierarchical information features include hierarchical categories.
Step S210, performing target detection according to the feature graphs and the level information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected.
Specifically, after the hierarchical information features are obtained, target detection is performed according to a plurality of feature maps with different scales and the hierarchical information features, so that category information and position information of targets in the image to be detected are obtained. Optionally, the feature map and the level information feature of multiple different scales may be input into the target detection model together, for example, may be R-CNN (Region-CNN), and the category information and the position information of the target in the image to be detected are obtained by predicting the feature map and the level information feature of multiple different scales through the R-CNN.
In the target detection method, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into a fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
In an exemplary embodiment, the dimension reduction is performed on the target feature map, so as to obtain a possible implementation process of the dimension-reduced target feature map. On the basis of the above embodiment, as shown in fig. 3, step S206 may be specifically implemented by the following steps:
in step S2062, the target feature map is subjected to global maximum pooling, and the first pooled feature is obtained.
The global max pooling refers to taking the maximum value of a two-dimensional matrix of each channel for a certain feature map (feature map), and the maximum value is taken as the maximum information of the channel. Equivalent to representing a two-dimensional channel with a maximum value.
Specifically, inputting the target feature map into a global maximum pooling layer for global maximum pooling to obtain a first pooling feature. Wherein the dimension of the first pooled feature is lower than the dimension of the target feature map.
Step S2064, performing global average pooling on the target feature map to obtain a second pooled feature.
The global averaging pooling refers to taking an average value of a two-dimensional matrix of each channel for a certain feature map (feature map) as average information of the channel. Equivalent to representing a two-dimensional channel with an average value.
Specifically, the target feature map is input into a global average pooling layer to carry out global average pooling, and second pooling features are obtained. Wherein the dimension of the second pooled feature is lower than the dimension of the target feature map.
And step S2066, adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
Specifically, after the first pooling feature and the second pooling feature are obtained, adding or splicing the first pooling feature and the second pooling feature to obtain the target feature map after dimension reduction. Optionally, the first pooled feature is the same as the second pooled feature in its dimension, such that after addition or stitching, a target feature map of that dimension is obtained.
In the embodiment of the application, the global maximum pooling and the global average pooling are respectively carried out on the target feature map, so that the feature dimension can be reduced on one hand, and the background information and the texture information of the image can be reserved more on the other hand for the obtained target feature map, thereby being beneficial to improving the accuracy of target detection.
In an exemplary embodiment, the object detection is performed according to feature graphs and hierarchical information features of a plurality of different scales, so as to obtain a possible implementation process of category information and position information of an object in an image to be detected. On the basis of the above embodiment, as shown in fig. 4, step S210 may be specifically implemented by:
step S2102, obtaining a hierarchical information feature map according to a plurality of feature maps with different scales and hierarchical information features;
step S2104, determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the image to be detected;
step S2106, inputting the characteristics of the target in the image to be detected into the fully connected network to obtain the category information and the position information of the target in the image to be detected.
Specifically, after a plurality of feature maps and hierarchical information features with different scales are obtained, the feature maps and the hierarchical information features with different scales are fused to obtain a hierarchical information feature map. And then, carrying out region-of-interest pooling operation based on candidate regions according to the hierarchical information feature map, extracting corresponding features of the target in the image to be detected, and then processing the target features through a fully-connected network to judge the type and the position of the target, so as to finish target detection and obtain the type information and the position information of the target in the image to be detected. Alternatively, the hierarchical information feature map may be input into a target detection model, which may be, for example, R-CNN (Region-CNN), by which category information and position information of a target in an image to be detected are predicted.
More specifically, a specific implementation manner of fusing the feature graphs and the hierarchical information features of a plurality of different scales may be: firstly, fusing the hierarchical information features with the target feature images, and then fusing the fusion result with the feature images of the next layer, namely sequentially fusing the feature images of each layer in a top-down mode. The fusion result of all layers can be called as a hierarchical information feature map, and the fusion result of each layer can be called as a hierarchical information feature map and can be set according to actual requirements.
In the embodiment of the application, the feature map with the level information is obtained by fusing the feature maps with different scales and the level information features, and the classification features can be effectively improved by using the feature map, so that the accuracy of target detection is improved.
In an exemplary embodiment, the method involves a possible implementation process of fully connecting the dimension reduction features to obtain the hierarchical information features. Taking the hierarchical category as an example, based on the above embodiment, as shown in fig. 5, step S208 may be specifically implemented by the following steps:
step S2082, performing full connection on the target feature map after dimension reduction to obtain a multi-dimension vector;
step S2084, determining the confidence score of each category according to the elements in the multidimensional vector;
step S2086, selecting a category with confidence score meeting a preset condition, and determining the category as a hierarchy category.
Specifically, assuming n categories to be predicted, inputting the target feature map after the dimension reduction into a fully connected network, and fully connecting the target feature map after the dimension reduction through the fully connected network to obtain an n-dimensional vector. Each element in the n-dimensional vector represents the confidence score of the corresponding category, so that the confidence score of each category is determined according to the elements in the multi-dimensional vector, and the category with the confidence score meeting the preset condition is selected from all the confidence scores to be determined as the hierarchical category. Alternatively, the category with the highest confidence score may be selected as the hierarchical category. For example, assuming that the image to be detected contains a giraffe, the predicted hierarchical category is an animal. Assuming that the image to be detected contains a bus, the predicted hierarchical category is a vehicle.
In the embodiment of the application, the accuracy of the hierarchical category can be ensured by fully connecting the dimension-reduced target feature graphs and classifying the dimension-reduced target feature graphs based on the confidence score to determine the hierarchical category, so that the accuracy of target detection is improved.
In an exemplary embodiment, a training process for hierarchical classification models is involved. Specifically, the training process includes: first, an image sample is acquired. And then, carrying out multi-scale feature extraction on the image sample through a convolutional neural network to obtain a plurality of feature map samples with different scales. And then, reducing the dimension of the target feature pattern book through a pooling network in the hierarchical classification model to obtain a dimension-reduced target feature pattern sample. And then, processing the target feature map sample after the dimension reduction through a fully connected network in the hierarchical classification model to obtain a hierarchical prediction result. And calculating the hierarchical classification loss through the loss function and the hierarchical prediction result, and adjusting model parameters through the hierarchical classification loss to obtain a trained hierarchical classification model.
In the embodiment of the application, the supervision signal is directly added on the basis of the original characteristic, which is equivalent to the regularization function on the characteristic, so that the characteristic can be relearned towards the activating direction of the hierarchy during training, and the characteristic learning can be effectively improved.
The following further demonstrates the advantages of the technical solutions of the present application by listing some experimental data, see table 1:
Method | lr sched | dataset | mmAP |
R50-FPN | 1 | COCO17 | 36.3 |
R50-FPN-MLL | 1 | COCO17 | 36.8 |
TABLE 1
As can be seen from Table 1, for the same data set COCO17, the accuracy of the technical scheme of the application reaches 36.8, and compared with the existing scheme, the technical scheme can reach 36.3, and has a lifting effect.
It should be understood that, although the steps in the flowcharts of fig. 2-5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-5 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.
In an exemplary embodiment, as shown in fig. 6, there is provided an object detection apparatus 30 including: a feature map extraction module 302, a feature map determination module 304, a feature map dimension reduction module 306, a feature map input module 308, and a target detection module 310, wherein:
the feature map extracting module 302 is configured to perform multi-scale feature extraction on an image to be detected, so as to obtain feature maps with a plurality of different scales.
The feature map determining module 304 is configured to determine target feature maps with spatial resolutions meeting preset conditions in a plurality of feature maps with different scales.
The feature map dimension reduction module 306 is configured to reduce dimensions of the target feature map to obtain a dimension-reduced target feature map.
The feature map input module 308 is configured to input the dimension-reduced target feature map into a fully-connected network to obtain a hierarchical information feature.
The object detection module 310 is configured to perform object detection according to a plurality of feature maps and hierarchical information features with different scales, so as to obtain category information and position information of an object in an image to be detected.
In the target detection device, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into the fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
In an exemplary embodiment, the feature map dimension reduction module 306 is specifically configured to perform global maximum pooling on the target feature map to obtain a first pooled feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
In an exemplary embodiment, the object detection module 310 is specifically configured to obtain a hierarchical information feature map according to a plurality of feature maps and hierarchical information features of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
In an exemplary embodiment, the object detection module 310 is specifically configured to fuse a plurality of feature maps with different scales and hierarchical information features to obtain a hierarchical information feature map.
In an exemplary embodiment, the hierarchical information features include hierarchical categories.
In an exemplary embodiment, the feature map input module 308 is specifically configured to fully connect the dimension-reduced target feature map to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.
In an exemplary embodiment, the feature map determining module 304 is specifically configured to select a feature map with a minimum spatial resolution of a plurality of feature maps with different scales, and determine the feature map as the target feature map.
For specific limitations of the object detection device, reference may be made to the above limitations of the object detection method, and no further description is given here. The respective modules in the above-described object detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In an exemplary embodiment, a computer device is provided, which may be a server, and an internal structure thereof may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object detection.
In an exemplary embodiment, a computer device, which may be a terminal, is provided, and an internal structure diagram thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of object detection. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 7 and 8 are block diagrams of only some of the structures associated with the aspects of the present application and are not intended to limit the computer device to which the aspects of the present application may be applied, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or may have a different arrangement of components.
In an exemplary embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor, when executing the computer program, performing the steps of:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs with spatial resolution meeting preset conditions in the feature graphs with different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature images and the hierarchical information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected.
In the computer equipment, firstly, the dimension of the extracted feature map is reduced, then the feature map after dimension reduction is input into a fully-connected network to obtain the hierarchical information feature, and finally, the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: carrying out global maximum pooling on the target feature map to obtain a first pooling feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: obtaining a hierarchical information feature map according to the feature maps and the hierarchical information features of a plurality of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: and fusing the feature graphs with different scales with the hierarchical information features to obtain the hierarchical information feature graph.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: fully connecting the dimension-reduced target feature graphs to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.
In an exemplary embodiment, the processor when executing the computer program further performs the steps of: and selecting a plurality of feature images with different scales and the feature image with the minimum spatial resolution, and determining the feature image as a target feature image.
In an exemplary embodiment, a computer readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs with spatial resolution meeting preset conditions in the feature graphs with different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;
and performing target detection according to the feature images and the hierarchical information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected.
In the computer readable storage medium, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into a fully connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: carrying out global maximum pooling on the target feature map to obtain a first pooling feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: obtaining a hierarchical information feature map according to the feature maps and the hierarchical information features of a plurality of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: and fusing the feature graphs with different scales with the hierarchical information features to obtain the hierarchical information feature graph.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: fully connecting the dimension-reduced target feature graphs to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.
In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: and selecting a plurality of feature images with different scales and the feature image with the minimum spatial resolution, and determining the feature image as a target feature image.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.
Claims (10)
1. A method of target detection, the method comprising:
carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;
determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;
performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;
inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; the hierarchical information features include hierarchical categories;
and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
2. The method of claim 1, wherein dimension reduction is performed on the target feature map to obtain a dimension reduced target feature map, comprising:
carrying out global maximum pooling on the target feature map to obtain a first pooling feature;
carrying out global average pooling on the target feature map to obtain a second pooling feature;
and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
3. The method according to claim 1, wherein performing object detection according to the feature maps of the multiple different scales and the hierarchical information features to obtain category information and position information of the object in the image to be detected includes:
obtaining a hierarchical information feature map according to the feature maps of the multiple different scales and the hierarchical information features;
determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the image to be detected;
and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.
4. A method according to claim 3, wherein deriving a hierarchical information feature map from the plurality of feature maps of different scales and the hierarchical information feature comprises:
and fusing the feature graphs with different scales with the hierarchical information features to obtain a hierarchical information feature graph.
5. The method of claim 1, wherein inputting the reduced-dimension target feature map into a fully-connected network to obtain a hierarchical information feature comprises:
performing full connection on the dimension-reduced target feature map to obtain a multi-dimension vector;
determining a confidence score for each category according to the elements in the multi-dimensional vector;
and selecting a category with confidence score meeting a preset condition, and determining the category as the hierarchy category.
6. The method of claim 1, wherein determining a target feature map for which spatial resolution meets a preset condition for the plurality of feature maps of different scales comprises:
and selecting the feature map with the minimum spatial resolution among the feature maps with different scales, and determining the feature map as a target feature map.
7. An object detection device, the device comprising:
the feature map extraction module is used for carrying out multi-scale feature extraction on the image to be detected to obtain a plurality of feature maps with different scales;
the feature map determining module is used for determining target feature maps, the spatial resolution of which meets preset conditions, of the feature maps with different scales;
the feature map dimension reduction module is used for reducing the dimension of the target feature map to obtain a dimension-reduced target feature map;
the feature map input module is used for inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; the hierarchical information features include hierarchical categories;
and the target detection module is used for carrying out target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.
8. The apparatus of claim 7, wherein the feature map dimension reduction module is specifically configured to perform global maximum pooling on the target feature map to obtain a first pooled feature; carrying out global average pooling on the target feature map to obtain a second pooling feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010166856.1A CN111292377B (en) | 2020-03-11 | 2020-03-11 | Target detection method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010166856.1A CN111292377B (en) | 2020-03-11 | 2020-03-11 | Target detection method, device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111292377A CN111292377A (en) | 2020-06-16 |
CN111292377B true CN111292377B (en) | 2024-01-23 |
Family
ID=71022977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010166856.1A Active CN111292377B (en) | 2020-03-11 | 2020-03-11 | Target detection method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111292377B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111768392B (en) * | 2020-06-30 | 2022-10-14 | 创新奇智(广州)科技有限公司 | Target detection method and device, electronic equipment and storage medium |
CN111814905A (en) * | 2020-07-23 | 2020-10-23 | 上海眼控科技股份有限公司 | Target detection method, target detection device, computer equipment and storage medium |
CN111881996A (en) * | 2020-08-03 | 2020-11-03 | 上海眼控科技股份有限公司 | Object detection method, computer device and storage medium |
CN114066818B (en) * | 2021-10-23 | 2023-04-07 | 广州市艾贝泰生物科技有限公司 | Cell detection analysis method, cell detection analysis device, computer equipment and storage medium |
CN114359340A (en) * | 2021-12-27 | 2022-04-15 | 中国电信股份有限公司 | Tracking method, device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109034054A (en) * | 2018-07-24 | 2018-12-18 | 华北电力大学 | Harmonic wave multi-tag classification method based on LSTM |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
WO2019101021A1 (en) * | 2017-11-23 | 2019-05-31 | 腾讯科技(深圳)有限公司 | Image recognition method, apparatus, and electronic device |
CN109886871A (en) * | 2019-01-07 | 2019-06-14 | 国家新闻出版广电总局广播科学研究院 | The image super-resolution method merged based on channel attention mechanism and multilayer feature |
-
2020
- 2020-03-11 CN CN202010166856.1A patent/CN111292377B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019101021A1 (en) * | 2017-11-23 | 2019-05-31 | 腾讯科技(深圳)有限公司 | Image recognition method, apparatus, and electronic device |
CN109034054A (en) * | 2018-07-24 | 2018-12-18 | 华北电力大学 | Harmonic wave multi-tag classification method based on LSTM |
CN109344821A (en) * | 2018-08-30 | 2019-02-15 | 西安电子科技大学 | Small target detecting method based on Fusion Features and deep learning |
CN109886871A (en) * | 2019-01-07 | 2019-06-14 | 国家新闻出版广电总局广播科学研究院 | The image super-resolution method merged based on channel attention mechanism and multilayer feature |
Also Published As
Publication number | Publication date |
---|---|
CN111292377A (en) | 2020-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111292377B (en) | Target detection method, device, computer equipment and storage medium | |
WO2020228446A1 (en) | Model training method and apparatus, and terminal and storage medium | |
CN110852349B (en) | Image processing method, detection method, related equipment and storage medium | |
KR101896357B1 (en) | Method, device and program for detecting an object | |
KR102140805B1 (en) | Neural network learning method and apparatus for object detection of satellite images | |
CN113239818B (en) | Table cross-modal information extraction method based on segmentation and graph convolution neural network | |
CN110610143B (en) | Crowd counting network method, system, medium and terminal for multi-task combined training | |
CN111310800B (en) | Image classification model generation method, device, computer equipment and storage medium | |
CN112101386B (en) | Text detection method, device, computer equipment and storage medium | |
CN108334805A (en) | The method and apparatus for detecting file reading sequences | |
JP2010157118A (en) | Pattern identification device and learning method for the same and computer program | |
CN113034514A (en) | Sky region segmentation method and device, computer equipment and storage medium | |
CN116665054A (en) | Remote sensing image small target detection method based on improved YOLOv3 | |
CN113012189A (en) | Image recognition method and device, computer equipment and storage medium | |
CN111382638A (en) | Image detection method, device, equipment and storage medium | |
CN116630630B (en) | Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium | |
CN117710728A (en) | SAR image target recognition method, SAR image target recognition device, SAR image target recognition computer equipment and storage medium | |
CN117115824A (en) | Visual text detection method based on stroke region segmentation strategy | |
CN114677578B (en) | Method and device for determining training sample data | |
CN114445716B (en) | Key point detection method, key point detection device, computer device, medium, and program product | |
CN116524296A (en) | Training method and device of equipment defect detection model and equipment defect detection method | |
CN116310899A (en) | YOLOv 5-based improved target detection method and device and training method | |
CN112509052B (en) | Method, device, computer equipment and storage medium for detecting macula fovea | |
CN112862002A (en) | Training method of multi-scale target detection model, target detection method and device | |
CN113743445A (en) | Target object identification method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |