CN111292377B

CN111292377B - Target detection method, device, computer equipment and storage medium

Info

Publication number: CN111292377B
Application number: CN202010166856.1A
Authority: CN
Inventors: 赵博睿; 魏秀参; 陈钊民
Original assignee: Xuzhou Kuangshi Data Technology Co ltd; Nanjing Kuangyun Technology Co ltd; Beijing Megvii Technology Co Ltd
Current assignee: Xuzhou Kuangshi Data Technology Co ltd; Nanjing Kuangyun Technology Co ltd; Beijing Megvii Technology Co Ltd
Priority date: 2020-03-11
Filing date: 2020-03-11
Publication date: 2024-01-23
Anticipated expiration: 2040-03-11
Also published as: CN111292377A

Abstract

The application relates to a target detection method, a target detection device, a computer device and a storage medium. Wherein the method comprises the following steps: carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales; determining target feature graphs with spatial resolution meeting preset conditions in the feature graphs with different scales; performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map; inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; and performing target detection according to the feature images and the hierarchical information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected. According to the method and the device, the hierarchical information features are added in the target detection process, so that the classification features can be effectively improved, and the accuracy of target detection is improved.

Description

Target detection method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of image data processing technologies, and in particular, to a target detection method, apparatus, computer device, and storage medium.

Background

With the development of artificial intelligence technology, target detection is one of the most basic and primary tasks in computer vision, and is widely applied to various aspects of industry and daily life, such as the fields of automatic driving, security monitoring, game entertainment and the like.

In the prior art, the target detection method predicts the boundary frame through the convolutional neural network, and then fine-tunes the boundary frame through the neural network once to further improve the quality of the boundary frame, thereby improving the accuracy of the boundary frame.

However, with the conventional target detection method, accuracy is low when detecting targets in a bounding box.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a target detection method, apparatus, computer device, and storage medium capable of improving the accuracy of target detection.

A method of target detection, the method comprising:

carrying out multi-scale feature extraction on an image to be detected to obtain a plurality of feature images with different scales;

determining target feature graphs of which the spatial resolutions meet preset conditions in the feature graphs of the multiple different scales;

performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map;

inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;

and performing target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.

In one embodiment, performing dimension reduction on the target feature map to obtain a dimension reduced target feature map, where the dimension reduced target feature map includes:

carrying out global maximum pooling on the target feature map to obtain a first pooling feature;

carrying out global average pooling on the target feature map to obtain a second pooling feature;

and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.

In one embodiment, performing object detection according to the feature maps of the multiple different scales and the hierarchical information features to obtain category information and position information of the object in the image to be detected, including:

obtaining a hierarchical information feature map according to the feature maps of the multiple different scales and the hierarchical information features;

determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the image to be detected;

and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.

In one embodiment, obtaining a hierarchical information feature map according to the feature maps of the multiple different scales and the hierarchical information features includes:

and fusing the feature graphs with different scales with the hierarchical information features to obtain a hierarchical information feature graph.

In one embodiment, the hierarchical information features include hierarchical categories.

In one embodiment, inputting the dimension reduced target feature map into a fully connected network to obtain a hierarchical information feature, including:

performing full connection on the dimension-reduced target feature map to obtain a multi-dimension vector;

determining a confidence score for each category according to the elements in the multi-dimensional vector;

and selecting a category with confidence score meeting a preset condition, and determining the category as the hierarchy category.

In one embodiment, determining the target feature map with spatial resolution meeting the preset condition in the feature maps with different scales includes:

and selecting the feature map with the minimum spatial resolution among the feature maps with different scales, and determining the feature map as a target feature map.

An object detection apparatus, the apparatus comprising:

the feature map extraction module is used for carrying out multi-scale feature extraction on the image to be detected to obtain a plurality of feature maps with different scales;

the feature map determining module is used for determining target feature maps, the spatial resolution of which meets preset conditions, of the feature maps with different scales;

the feature map dimension reduction module is used for reducing the dimension of the target feature map to obtain a dimension-reduced target feature map;

the feature map input module is used for inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features;

and the target detection module is used for carrying out target detection according to the feature graphs of the multiple different scales and the hierarchical information features to obtain category information and position information of the target in the image to be detected.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

According to the target detection method, the device, the computer equipment and the storage medium, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into the fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.

Drawings

FIG. 1 is a diagram of an application environment for a target detection method in one embodiment;

FIG. 2 is a flow chart of a method of detecting targets in one embodiment;

FIG. 3 is a flow diagram of a complementary scheme for dimension reduction of a target feature map in one embodiment;

FIG. 4 is a flow diagram of a complementary approach to object detection based on feature maps and hierarchical information features of multiple different scales in one embodiment;

FIG. 5 is a flow chart of a complementary scheme for inputting the reduced dimension target feature map into a fully connected network in one embodiment;

FIG. 6 is a block diagram of an object detection device in one embodiment;

FIG. 7 is an internal block diagram of a computer device in one embodiment;

fig. 8 is an internal structural view of a computer device in another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The target detection method provided by the application can be applied to an application environment shown in fig. 1. Wherein a hierarchical classification model 10 is added at the highest level of the feature pyramid network 20. In this hierarchical classification model 10, a pooling network 102, a fully connected network (FC) 104, and a Loss function (Loss) 106 are included. Optionally, the pooling network 102 includes a global average pooling layer (GAP) and a global maximum pooling layer (GMP). The features extracted by the feature pyramid network 20 are input into the R-CNN network to obtain a target detection result.

In an exemplary embodiment, the application environment shown in fig. 1 may be provided in a terminal, and it may be understood that the application environment may also be provided in a server, and may also be provided in a system including the terminal and the server, and implemented through interaction between the terminal and the server.

In an exemplary embodiment, as shown in fig. 2, there is provided a target detection method, which may be specifically implemented by the following steps:

step S202, multi-scale feature extraction is carried out on an image to be detected, and a plurality of feature images with different scales are obtained.

Specifically, firstly, an image to be detected is acquired, and is input into a convolutional neural network, so that the image to be detected is subjected to multiple convolution operations through the convolutional neural network to realize multi-scale feature extraction, and a plurality of feature images with different scales are obtained. The feature graphs with different scales can form a feature pyramid after being ordered according to the size of the scales. In the feature pyramid, feature graphs at a lower layer have rich detailed information, and feature graphs at a higher layer have rich semantic information. It will be appreciated that the higher the hierarchy of the feature map, the more rich the semantic information it contains. Whereas for spatial resolution, the higher the hierarchy of the feature map, the smaller its spatial resolution. It is not difficult to deduce that the spatial resolution of the feature map at the highest level is the smallest.

Step S204, determining a plurality of target feature maps with different scales, wherein the spatial resolution of the target feature maps meets the preset condition.

Specifically, the spatial resolution is taken as a selection parameter of the feature map, a preset condition is established, and after a plurality of feature maps with different scales are obtained, only the feature map with the spatial resolution meeting the preset condition is taken as a target feature map to carry out subsequent related processing on the feature map. For example, a feature map with the smallest spatial resolution may be regarded as the target feature map, or a feature map with the next smallest spatial resolution may be regarded as the target feature map, or the like.

And S206, performing dimension reduction on the target feature map to obtain a dimension-reduced target feature map.

Specifically, after determining the target feature map, dimension reduction is performed on the target feature map, and the dimension reduced target feature map is obtained. Optionally, the target feature map may be input into a pooling network, and the pooling operation may be performed on the target feature map through the pooling network to achieve dimension reduction. Optionally, the pooling network includes a global maximum pooling layer and/or a global average pooling layer, and then the dimension reduction method specifically may be: and respectively inputting the target feature images into a global maximum pooling layer and/or a global average pooling layer, and carrying out corresponding pooling operation on the target feature images through the global maximum pooling layer and/or the global average pooling layer to achieve the purpose of reducing the dimension.

And step S208, inputting the target feature map after dimension reduction into a fully-connected network to obtain the hierarchical information feature.

Specifically, after the dimension-reduced target feature map is obtained, the dimension-reduced target feature map is input into a fully-connected network, and the dimension-reduced target feature map is processed through the fully-connected network to obtain the hierarchical information feature.

The hierarchical information features are used for realizing hierarchical classification, and the hierarchical information features can be represented by vectors. Optionally, the hierarchical information features include hierarchical categories.

Step S210, performing target detection according to the feature graphs and the level information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected.

Specifically, after the hierarchical information features are obtained, target detection is performed according to a plurality of feature maps with different scales and the hierarchical information features, so that category information and position information of targets in the image to be detected are obtained. Optionally, the feature map and the level information feature of multiple different scales may be input into the target detection model together, for example, may be R-CNN (Region-CNN), and the category information and the position information of the target in the image to be detected are obtained by predicting the feature map and the level information feature of multiple different scales through the R-CNN.

In the target detection method, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into a fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.

In an exemplary embodiment, the dimension reduction is performed on the target feature map, so as to obtain a possible implementation process of the dimension-reduced target feature map. On the basis of the above embodiment, as shown in fig. 3, step S206 may be specifically implemented by the following steps:

in step S2062, the target feature map is subjected to global maximum pooling, and the first pooled feature is obtained.

The global max pooling refers to taking the maximum value of a two-dimensional matrix of each channel for a certain feature map (feature map), and the maximum value is taken as the maximum information of the channel. Equivalent to representing a two-dimensional channel with a maximum value.

Specifically, inputting the target feature map into a global maximum pooling layer for global maximum pooling to obtain a first pooling feature. Wherein the dimension of the first pooled feature is lower than the dimension of the target feature map.

Step S2064, performing global average pooling on the target feature map to obtain a second pooled feature.

The global averaging pooling refers to taking an average value of a two-dimensional matrix of each channel for a certain feature map (feature map) as average information of the channel. Equivalent to representing a two-dimensional channel with an average value.

Specifically, the target feature map is input into a global average pooling layer to carry out global average pooling, and second pooling features are obtained. Wherein the dimension of the second pooled feature is lower than the dimension of the target feature map.

And step S2066, adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.

Specifically, after the first pooling feature and the second pooling feature are obtained, adding or splicing the first pooling feature and the second pooling feature to obtain the target feature map after dimension reduction. Optionally, the first pooled feature is the same as the second pooled feature in its dimension, such that after addition or stitching, a target feature map of that dimension is obtained.

In the embodiment of the application, the global maximum pooling and the global average pooling are respectively carried out on the target feature map, so that the feature dimension can be reduced on one hand, and the background information and the texture information of the image can be reserved more on the other hand for the obtained target feature map, thereby being beneficial to improving the accuracy of target detection.

In an exemplary embodiment, the object detection is performed according to feature graphs and hierarchical information features of a plurality of different scales, so as to obtain a possible implementation process of category information and position information of an object in an image to be detected. On the basis of the above embodiment, as shown in fig. 4, step S210 may be specifically implemented by:

step S2102, obtaining a hierarchical information feature map according to a plurality of feature maps with different scales and hierarchical information features;

step S2104, determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the image to be detected;

step S2106, inputting the characteristics of the target in the image to be detected into the fully connected network to obtain the category information and the position information of the target in the image to be detected.

Specifically, after a plurality of feature maps and hierarchical information features with different scales are obtained, the feature maps and the hierarchical information features with different scales are fused to obtain a hierarchical information feature map. And then, carrying out region-of-interest pooling operation based on candidate regions according to the hierarchical information feature map, extracting corresponding features of the target in the image to be detected, and then processing the target features through a fully-connected network to judge the type and the position of the target, so as to finish target detection and obtain the type information and the position information of the target in the image to be detected. Alternatively, the hierarchical information feature map may be input into a target detection model, which may be, for example, R-CNN (Region-CNN), by which category information and position information of a target in an image to be detected are predicted.

More specifically, a specific implementation manner of fusing the feature graphs and the hierarchical information features of a plurality of different scales may be: firstly, fusing the hierarchical information features with the target feature images, and then fusing the fusion result with the feature images of the next layer, namely sequentially fusing the feature images of each layer in a top-down mode. The fusion result of all layers can be called as a hierarchical information feature map, and the fusion result of each layer can be called as a hierarchical information feature map and can be set according to actual requirements.

In the embodiment of the application, the feature map with the level information is obtained by fusing the feature maps with different scales and the level information features, and the classification features can be effectively improved by using the feature map, so that the accuracy of target detection is improved.

In an exemplary embodiment, the method involves a possible implementation process of fully connecting the dimension reduction features to obtain the hierarchical information features. Taking the hierarchical category as an example, based on the above embodiment, as shown in fig. 5, step S208 may be specifically implemented by the following steps:

step S2082, performing full connection on the target feature map after dimension reduction to obtain a multi-dimension vector;

step S2084, determining the confidence score of each category according to the elements in the multidimensional vector;

step S2086, selecting a category with confidence score meeting a preset condition, and determining the category as a hierarchy category.

Specifically, assuming n categories to be predicted, inputting the target feature map after the dimension reduction into a fully connected network, and fully connecting the target feature map after the dimension reduction through the fully connected network to obtain an n-dimensional vector. Each element in the n-dimensional vector represents the confidence score of the corresponding category, so that the confidence score of each category is determined according to the elements in the multi-dimensional vector, and the category with the confidence score meeting the preset condition is selected from all the confidence scores to be determined as the hierarchical category. Alternatively, the category with the highest confidence score may be selected as the hierarchical category. For example, assuming that the image to be detected contains a giraffe, the predicted hierarchical category is an animal. Assuming that the image to be detected contains a bus, the predicted hierarchical category is a vehicle.

In the embodiment of the application, the accuracy of the hierarchical category can be ensured by fully connecting the dimension-reduced target feature graphs and classifying the dimension-reduced target feature graphs based on the confidence score to determine the hierarchical category, so that the accuracy of target detection is improved.

In an exemplary embodiment, a training process for hierarchical classification models is involved. Specifically, the training process includes: first, an image sample is acquired. And then, carrying out multi-scale feature extraction on the image sample through a convolutional neural network to obtain a plurality of feature map samples with different scales. And then, reducing the dimension of the target feature pattern book through a pooling network in the hierarchical classification model to obtain a dimension-reduced target feature pattern sample. And then, processing the target feature map sample after the dimension reduction through a fully connected network in the hierarchical classification model to obtain a hierarchical prediction result. And calculating the hierarchical classification loss through the loss function and the hierarchical prediction result, and adjusting model parameters through the hierarchical classification loss to obtain a trained hierarchical classification model.

In the embodiment of the application, the supervision signal is directly added on the basis of the original characteristic, which is equivalent to the regularization function on the characteristic, so that the characteristic can be relearned towards the activating direction of the hierarchy during training, and the characteristic learning can be effectively improved.

The following further demonstrates the advantages of the technical solutions of the present application by listing some experimental data, see table 1:

Method	lr sched	dataset	mmAP
				R50-FPN	1	COCO17	36.3
R50-FPN-MLL	1	COCO17	36.8

TABLE 1

As can be seen from Table 1, for the same data set COCO17, the accuracy of the technical scheme of the application reaches 36.8, and compared with the existing scheme, the technical scheme can reach 36.3, and has a lifting effect.

It should be understood that, although the steps in the flowcharts of fig. 2-5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-5 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.

In an exemplary embodiment, as shown in fig. 6, there is provided an object detection apparatus 30 including: a feature map extraction module 302, a feature map determination module 304, a feature map dimension reduction module 306, a feature map input module 308, and a target detection module 310, wherein:

the feature map extracting module 302 is configured to perform multi-scale feature extraction on an image to be detected, so as to obtain feature maps with a plurality of different scales.

The feature map determining module 304 is configured to determine target feature maps with spatial resolutions meeting preset conditions in a plurality of feature maps with different scales.

The feature map dimension reduction module 306 is configured to reduce dimensions of the target feature map to obtain a dimension-reduced target feature map.

The feature map input module 308 is configured to input the dimension-reduced target feature map into a fully-connected network to obtain a hierarchical information feature.

The object detection module 310 is configured to perform object detection according to a plurality of feature maps and hierarchical information features with different scales, so as to obtain category information and position information of an object in an image to be detected.

In the target detection device, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into the fully-connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.

In an exemplary embodiment, the feature map dimension reduction module 306 is specifically configured to perform global maximum pooling on the target feature map to obtain a first pooled feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.

In an exemplary embodiment, the object detection module 310 is specifically configured to obtain a hierarchical information feature map according to a plurality of feature maps and hierarchical information features of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.

In an exemplary embodiment, the object detection module 310 is specifically configured to fuse a plurality of feature maps with different scales and hierarchical information features to obtain a hierarchical information feature map.

In an exemplary embodiment, the hierarchical information features include hierarchical categories.

In an exemplary embodiment, the feature map input module 308 is specifically configured to fully connect the dimension-reduced target feature map to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.

In an exemplary embodiment, the feature map determining module 304 is specifically configured to select a feature map with a minimum spatial resolution of a plurality of feature maps with different scales, and determine the feature map as the target feature map.

For specific limitations of the object detection device, reference may be made to the above limitations of the object detection method, and no further description is given here. The respective modules in the above-described object detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In an exemplary embodiment, a computer device is provided, which may be a server, and an internal structure thereof may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of object detection.

In an exemplary embodiment, a computer device, which may be a terminal, is provided, and an internal structure diagram thereof may be as shown in fig. 8. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of object detection. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 7 and 8 are block diagrams of only some of the structures associated with the aspects of the present application and are not intended to limit the computer device to which the aspects of the present application may be applied, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or may have a different arrangement of components.

In an exemplary embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor, when executing the computer program, performing the steps of:

determining target feature graphs with spatial resolution meeting preset conditions in the feature graphs with different scales;

and performing target detection according to the feature images and the hierarchical information features of a plurality of different scales to obtain category information and position information of the target in the image to be detected.

In the computer equipment, firstly, the dimension of the extracted feature map is reduced, then the feature map after dimension reduction is input into a fully-connected network to obtain the hierarchical information feature, and finally, the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.

In an exemplary embodiment, the processor when executing the computer program further performs the steps of: carrying out global maximum pooling on the target feature map to obtain a first pooling feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.

In an exemplary embodiment, the processor when executing the computer program further performs the steps of: obtaining a hierarchical information feature map according to the feature maps and the hierarchical information features of a plurality of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.

In an exemplary embodiment, the processor when executing the computer program further performs the steps of: and fusing the feature graphs with different scales with the hierarchical information features to obtain the hierarchical information feature graph.

In an exemplary embodiment, the processor when executing the computer program further performs the steps of: fully connecting the dimension-reduced target feature graphs to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.

In an exemplary embodiment, the processor when executing the computer program further performs the steps of: and selecting a plurality of feature images with different scales and the feature image with the minimum spatial resolution, and determining the feature image as a target feature image.

In an exemplary embodiment, a computer readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:

In the computer readable storage medium, the dimension of the extracted feature map is reduced, the feature map after dimension reduction is input into a fully connected network to obtain the hierarchical information feature, and finally the target detection task is realized based on the hierarchical information feature and the extracted feature map. By adding the hierarchical information features in the target detection process, classification features can be effectively improved, so that the accuracy of target detection is improved.

In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: carrying out global maximum pooling on the target feature map to obtain a first pooling feature; carrying out global average pooling on the target feature map to obtain a second pooled feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.

In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: obtaining a hierarchical information feature map according to the feature maps and the hierarchical information features of a plurality of different scales; determining candidate areas in the hierarchical information feature map, and carrying out pooling operation on the candidate areas to extract features of targets in the images to be detected; and inputting the characteristics of the target in the image to be detected into a fully-connected network to obtain the category information and the position information of the target in the image to be detected.

In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: and fusing the feature graphs with different scales with the hierarchical information features to obtain the hierarchical information feature graph.

In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: fully connecting the dimension-reduced target feature graphs to obtain a multi-dimension vector; determining a confidence score of each category according to the elements in the multi-dimensional vector; and selecting the category with the confidence score meeting the preset condition, and determining the category as the hierarchy category.

In an exemplary embodiment, the computer program when executed by the processor further performs the steps of: and selecting a plurality of feature images with different scales and the feature image with the minimum spatial resolution, and determining the feature image as a target feature image.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of target detection, the method comprising:

inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; the hierarchical information features include hierarchical categories;

2. The method of claim 1, wherein dimension reduction is performed on the target feature map to obtain a dimension reduced target feature map, comprising:

3. The method according to claim 1, wherein performing object detection according to the feature maps of the multiple different scales and the hierarchical information features to obtain category information and position information of the object in the image to be detected includes:

4. A method according to claim 3, wherein deriving a hierarchical information feature map from the plurality of feature maps of different scales and the hierarchical information feature comprises:

5. The method of claim 1, wherein inputting the reduced-dimension target feature map into a fully-connected network to obtain a hierarchical information feature comprises:

6. The method of claim 1, wherein determining a target feature map for which spatial resolution meets a preset condition for the plurality of feature maps of different scales comprises:

7. An object detection device, the device comprising:

the feature map input module is used for inputting the dimension-reduced target feature map into a fully-connected network to obtain hierarchical information features; the hierarchical information features include hierarchical categories;

8. The apparatus of claim 7, wherein the feature map dimension reduction module is specifically configured to perform global maximum pooling on the target feature map to obtain a first pooled feature; carrying out global average pooling on the target feature map to obtain a second pooling feature; and adding or splicing the first pooling feature and the second pooling feature to obtain the dimension-reduced target feature map.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.