CN112232361A

CN112232361A - Image processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN112232361A
Application number: CN202011091051.1A
Authority: CN
Inventors: 张宾; 孙喜民; 周晶; 刘丹; 李晓明
Original assignee: State Grid E Commerce Co Ltd; State Grid E Commerce Technology Co Ltd
Current assignee: State Grid Digital Technology Holdings Co ltd; State Grid E Commerce Technology Co Ltd
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-01-15
Anticipated expiration: 2040-10-13
Also published as: CN112232361B

Abstract

The application provides an image processing method and device, an electronic device and a computer readable storage medium, comprising: the method comprises the steps of obtaining a multi-scale feature map of an image to be detected, using the multi-scale feature map as a first feature map, identifying position features and non-position features in the first feature map for each first feature map, carrying out convolution operation on the first feature map to obtain a second feature map, wherein in the convolution operation, the weight of a convolution kernel corresponding to the position features is larger than the weight of a convolution kernel corresponding to the non-position features, so that the position features in the second feature map are enhanced, a third feature map obtained by multiplying the second feature map and the first feature map has enhanced position features, and therefore a fourth feature map used for detecting the image to be detected is generated according to the third feature map, and the accuracy of detecting the image to be detected can be improved.

Description

Image processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a method and an apparatus for image processing, an electronic device, and a computer-readable storage medium.

Background

In image processing, an FPN (Feature Pyramid Network) is often used to detect and process an image, and the FPN mainly includes convolution operation and Feature fusion processing. The feature maps with different scales can be obtained through convolution operation, and the feature fusion processing is to perform feature fusion calculation on the feature maps after convolution to obtain a new feature map so as to perform image processing such as image detection and the like by using the new feature map.

Because the position features included in the feature map obtained by the downsampling process (i.e., the number of sampling points of the feature map is reduced through convolution operation in the FPN) are weakened, the position features of the new feature map obtained after the feature map is subsequently used for fusion calculation are less, and the accuracy of image detection is affected.

Disclosure of Invention

The inventors have found, through research, that the reason why the feature map obtained by the down-sampling process includes a weakened position feature is that the position feature in the feature map obtained by the down-sampling process is almost the same as the non-position feature, that is, the information of the position feature in the feature map obtained by the down-sampling process is not prominent (that is, the information of the position feature is substantially weakened), and thus the present application provides an image processing method and apparatus for enhancing the position feature in the salient feature map before performing the feature fusion calculation on the feature map obtained by the down-sampling process, so as to solve the problem that the position feature of a new feature map obtained by the feature fusion calculation is less and affects the image detection accuracy.

In order to achieve the above object, the present application provides the following technical solutions:

a method of image processing, comprising:

receiving an image to be detected;

acquiring a multi-scale characteristic map of the image to be detected as a first characteristic map;

for each of the first feature maps, identifying location features and non-location features in the first feature map;

performing convolution operation on the first characteristic diagram to obtain a second characteristic diagram; in the convolution operation, the weight of the convolution kernel corresponding to the position feature is greater than the weight of the convolution kernel corresponding to the non-position feature;

multiplying the second characteristic diagram with the first characteristic diagram to obtain a third characteristic diagram;

and generating a fourth characteristic diagram for detecting the image to be detected according to the third characteristic diagram.

In the foregoing method, optionally, the performing convolution operation on the first feature map to obtain a second feature map includes:

and performing the convolution operation on the first characteristic diagram for multiple times to obtain the second characteristic diagram.

In the foregoing method, optionally, in performing the convolution operation on the first feature map, the size of a convolution kernel is 1 × 1 × M, where M is the number of channels of the first feature map.

In the foregoing method, optionally, in any convolution operation performed on the first feature map after the convolution operation, the size of a convolution kernel is 1 × 1 × N, where N is the number of convolution kernels in the last convolution operation.

In the above method, optionally, the identifying the position feature and the non-position feature in the first feature map, and performing convolution operation on the first feature map to obtain a second feature map includes:

inputting the first characteristic diagram into a pre-constructed space weight model to obtain a second characteristic diagram output by the space weight model; the spatial weight model is used for identifying position features and non-position features in the first feature map and performing convolution operation on the first feature map, wherein in the convolution operation, the weight of a convolution kernel corresponding to the position features is larger than the weight of a convolution kernel corresponding to the non-position features.

Optionally, in the method, generating a fourth feature map for detecting the image to be detected according to the third feature map includes:

taking the third feature map with the smallest scale as a first fourth feature map;

performing convolution calculation on any one third feature map except for the third feature map with the minimum scale by using a convolution layer with the size of 1 × 1, and performing feature connection operation on the third feature map and a target fourth feature map subjected to upsampling processing to obtain a fourth feature map corresponding to the third feature map;

the target fourth feature map of the third feature map is a fourth feature map corresponding to a third feature map having a size adjacent to and smaller than that of the third feature map in a multi-scale third feature map, and the fourth feature map is calculated at least based on the third feature map.

Optionally, the acquiring the multi-scale feature map of the image to be detected includes:

and performing bottom-up path down-sampling processing on the image to be detected by using the FPN to obtain characteristic maps of the image to be detected with different scales.

An apparatus for image processing, comprising:

the receiving unit is used for receiving an image to be detected;

the acquisition unit is used for acquiring the multi-scale characteristic diagram of the image to be detected as a first characteristic diagram;

the identification unit is used for identifying position features and non-position features in the first feature map aiming at each first feature map;

the first operation unit is used for performing convolution operation on the first characteristic diagram to obtain a second characteristic diagram; in the convolution operation, the weight of the convolution kernel corresponding to the position feature is greater than the weight of the convolution kernel corresponding to the non-position feature

The second operation unit is used for multiplying the second characteristic diagram and the first characteristic diagram to obtain a third characteristic diagram;

and the generating unit is used for generating a fourth feature map for detecting the image to be detected according to the third feature map.

An electronic device, comprising: a processor and a memory for storing a program; the processor is used for running the program to realize the image processing method.

A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of image processing described above.

According to the method and the device, the multi-scale feature map of the image to be detected is obtained and used as the first feature map, the position feature and the non-position feature in the first feature map are identified aiming at each first feature map, the convolution operation is carried out on the first feature map, the second feature map is obtained, and because the weight of the convolution kernel corresponding to the position feature is larger than the weight of the convolution kernel corresponding to the non-position feature in the convolution operation, the position feature in the second feature map is enhanced, the third feature map obtained by multiplying the second feature map and the first feature map has the enhanced position feature, the fourth feature map used for detecting the image to be detected is generated according to the third feature map, and the accuracy of detecting the image to be detected can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a method for image processing according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of image processing of an image processing model according to an embodiment of the present application;

fig. 3 is a schematic diagram of image processing of a spatial weight model according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the field of image processing, particularly image detection processing, it is common to perform image feature extraction on an image using FPN and detect the image based on the extracted image features. However, because the FPN performs bottom-up path down-sampling on the picture to obtain the multi-scale feature map, the original position features of the image may be weakened, which may possibly cause the accuracy of the subsequent image detection based on the multi-scale feature map to be reduced.

The inventors have found that the reason why the feature map obtained by the down-sampling process includes the weakened position features is that the position features in the feature map obtained by the down-sampling process are almost the same as those of the non-position features, that is, the feature map obtained by the down-sampling process weakens the information of the position features.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

According to the method provided by the embodiment of the application, the execution main body is the server with the image processing function.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present application, and the method may include the following steps:

s101, receiving an image to be detected.

S102, acquiring a multi-scale characteristic diagram of the image to be detected as a first characteristic diagram.

The specific embodiment mode of the step is as follows: and performing bottom-up path down-sampling processing on the image to be detected by using the FPN to obtain feature maps of the image to be detected in different scales, and taking the feature maps of the different scales as a first feature map.

The number of the multi-scale feature maps obtained by FPN processing and the scale of each feature map can be set by self according to requirements. The prior art can be referred to for a specific embodiment of downsampling an image to be detected by using FPN.

And S103, identifying the position feature and the non-position feature in the first feature map aiming at each first feature map.

The position feature is position information for describing a target image included in the picture, such as a shape contour edge of the target image.

In this step, the position feature and the non-position feature in the first feature map may be identified by using a position identification model trained in advance, the position identification model may be obtained by training a training sample carrying a position feature tag and a non-position feature tag, and the specific training configuration may refer to an existing neural network model training method.

Of course, the position feature and the non-position feature in the first feature map may also be determined by an existing method for identifying the shape contour edge of the target image.

And S104, performing convolution operation on the first characteristic diagram to obtain a second characteristic diagram.

In the convolution operation, the weight of the convolution kernel corresponding to the position feature is greater than the weight of the convolution kernel corresponding to the non-position feature.

It should be noted that, the weights of the convolution kernels corresponding to the position features are greater than the weights of the convolution kernels corresponding to the non-position features, so that the position features and the non-position features in the feature map are different, and the position features in the salient feature map are enhanced.

The embodiment mode of the step can be as follows: and performing convolution operation on the first characteristic diagram for multiple times to obtain a second characteristic diagram, wherein in any convolution operation, the weight of the convolution kernel corresponding to the position characteristic is greater than the weight of the convolution kernel corresponding to the non-position characteristic. The specific times of the convolution operation can be set by self according to requirements. In this step, the purpose of each convolution operation is to further enhance the information of the position feature in the feature map.

The convolution operation performed on the first characteristic diagram for multiple times is as follows: and taking the first characteristic diagram as the first characteristic diagram for convolution operation, and performing convolution operation on the characteristic diagram obtained by the last convolution operation by any subsequent convolution operation.

In the convolution operation of the first feature map, the convolution kernel size is 1 × 1 × M, and M is the number of channels of the first feature map. In any convolution operation after the convolution operation is performed on the first feature map, the size of a convolution kernel is 1 × 1 × N, and N is the number of convolution kernels of the previous convolution operation.

For example, if the size of the convolution kernel of the first convolution operation is 1 × 1 × M and the number thereof is 512, the size of the second convolution operation is 1 × 1 × 512.

Note that, the convolution operation performed in this step does not change the scale of the feature map, that is, the scale of the finally obtained second feature map is the same as the scale of the first feature map.

And S105, multiplying the second characteristic diagram and the first characteristic diagram to obtain a third characteristic diagram.

In this embodiment, the second feature map and the first feature map have the same scale, and the multiplication operation of the second feature map and the first feature map is that each feature value in the second feature map is multiplied by a feature value at the same position in the first feature map to obtain a third feature map.

And S106, generating a fourth feature map for detecting the image to be detected according to the third feature map.

Specific embodiment modes of the step may include steps a1 to a 2:

step A1, taking the third feature map with the smallest dimension as the first fourth feature map;

in this example, for each scale of the first feature map, the scale of the second feature map obtained by performing the first type convolution operation and the second type convolution operation is the same as the scale of the first feature map, so that the scale sample of the third feature map obtained by performing the multiplication operation on the second feature map and the first feature map is the same as the scale sample of the first feature map. Therefore, the third feature map with the smallest scale is the third feature map corresponding to the first feature map with the smallest scale in the multi-scale first feature map.

And A2, performing convolution calculation on the third feature map by using a convolution layer with the size of 1 x 1 aiming at any one of the third feature maps except the third feature map with the smallest scale, and performing feature connection operation on the target fourth feature map subjected to the upsampling processing to obtain a fourth feature map corresponding to the third feature map.

And the target fourth feature map of the third feature map is a fourth feature map corresponding to the third feature map, wherein the size of the fourth feature map is adjacent to that of the third feature map and is smaller than that of the third feature map in the multi-scale third feature map, and the third feature map corresponds to the fourth feature map, and the fourth feature map is obtained by calculation at least according to the third feature map.

Performing convolution calculation on the third feature map by using the convolution layer with the size of 1 × 1, so that the number of channels of the obtained feature map is consistent with the number of channels of the target fourth feature map, performing upsampling processing on the target fourth feature map, and enabling the target fourth feature map to be consistent with the scale of the feature map obtained by performing convolution calculation on the third feature map by using the convolution layer with the size of 1 × 1.

After convolution calculation is carried out on the third characteristic diagram by adopting a convolution layer with the size of 1 multiplied by 1, characteristic connection operation is carried out on the convolution calculation and the target fourth characteristic diagram after upsampling processing, so that aliasing effect can be avoided.

According to the method, the multi-scale feature map of the image to be detected is obtained and used as the first feature map, the position feature and the non-position feature in the first feature map are identified for each first feature map, convolution operation is conducted on the first feature map, the second feature map is obtained, in the convolution operation, the weight of a convolution kernel corresponding to the position feature is larger than the weight of a convolution kernel corresponding to the non-position feature, the position feature in the second feature map is enhanced, a third feature map obtained by multiplying the second feature map and the first feature map has the enhanced position feature, a fourth feature map used for detecting the image to be detected is generated according to the third feature map, and the accuracy of detecting the image to be detected can be improved.

In the above embodiment, the position feature and the non-position feature in the first feature map are identified, and the convolution operation is performed on the position feature to obtain the second feature map, which may be completed by a pre-established spatial weight model.

The spatial weight model is used for identifying the position features and the non-position features in the first feature map and carrying out convolution operation on the first feature map, wherein in the convolution operation, the weight of a convolution kernel corresponding to the position features is larger than that of a convolution kernel corresponding to the non-position features, so that the information of the position features in the second feature map is enhanced.

In this embodiment, an image to be detected is processed by using an image processing model obtained by combining the FPN model and the spatial weight model. Fig. 2 is a schematic diagram of a process of processing a picture by an image processing model, and as shown in fig. 2, the process of processing the picture by the image processing model is divided into 3 parts:

the first part is a multi-scale first feature map generated by performing bottom-up path downsampling processing on a picture to be detected by using FPN, and C2, C3, C4 and C5 respectively represent first feature maps with different scales. The scale of the first feature map obtained by bottom-up path down-sampling is continuously reduced, and C5 is the first feature map with the smallest scale.

The second part is to calculate the first feature map Cn (n is 2,3,4,5) by using a spatial weight model to obtain a corresponding feature map Rn (n is 2,3,4, 5). In fig. 2, a horizontal line connecting portion between Cn (n ═ 2,3,4,5) and the second feature map Rn (n ═ 2,3,4,5) indicates: and inputting the first characteristic diagram Cn into a space weight model, and outputting the second characteristic diagram Rn after the space weight model operates the first characteristic diagram Cn. The second feature map Rn obtained by calculating the first feature map Cn using the spatial weight model has the same size as the first feature map Cn, for example, the dimensions of C2 are the same as those of R2. Therefore, in Rn (n ═ 2,3,4,5), the R5 scale is the smallest, and the R2 scale is the largest.

Fig. 3 is a schematic diagram of a specific processing procedure of the spatial weight model on the first feature map Cn.

And the third part is to perform feature fusion calculation on each Rn to obtain a final fourth feature map Pn (n is 2,3,4,5) for generating an image to be detected. Wherein P5 is the generated R5 in the second part, and the specific fusion process of P2, P3 and P4 is shown in FIG. 2;

(1) performing 2 times of upsampling on the Pm +1 to obtain a feature map with the same size as that of the Rm, wherein m is 2,3 and 4;

(2) performing 1 × 1 convolution operation on Rm to obtain a feature map of the channel number same as Pm +1, wherein m is 2,3 and 4;

(3) and (3) performing characteristic connection operation on the characteristic graphs obtained in the steps (1) and (2) to obtain Pm, wherein m is 2,3 and 4, and the characteristic connection operation is adopted to replace the addition operation in the traditional FPN structure, so that the aliasing effect caused by the addition operation can be avoided.

In the method provided by this embodiment, an image processing model obtained by combining the FPN model and the spatial weight model is used to process an image to be detected. The spatial weight model is used for identifying the position features and the non-position features in the first feature map and performing convolution operation on the position features, and in the convolution operation, the weight of the convolution kernel corresponding to the position features is larger than the weight of the convolution kernel corresponding to the non-position features, so that the position features in the second feature map output by the second feature map are enhanced, and the accuracy of detecting the image to be detected can be improved.

FIG. 3 is a graph of a spatial weight model versus a first feature map CnSchematic processing of the body. Wherein the feature value of the first feature map Cn is represented by C_n ^(p,q). As shown in fig. 3, the small cube in the first feature map Cn indicates that the feature values of all the channels on the same straight line of the first feature map Cn constitute a numerical sequence. Wherein, on different channels in the same straight line, C_n ^(p,q)May be different.

The specific processing process of the spatial weight model on the first feature map Cn comprises the following steps:

(1) and performing a first convolution operation on the first characteristic diagram by adopting the first convolution layer to obtain a characteristic diagram 1.

In fig. 3, the minicubes in the feature map 1 represent numerical value sequences corresponding to numerical value sequences of the minicubes in the first feature map Cn after convolution operation of the first convolution layer.

In the first convolution operation, the size of the convolution kernel in the first convolution layer is 1 × 1 × M, where M is the number of channels in the first feature map Cn. The total number of convolution kernels in the first convolutional layer is 512 (the total number of convolution kernels may be other values, which is only an example).

And performing first convolution operation, wherein the weight of the convolution kernel corresponding to the position characteristic is greater than the weight of the convolution kernel corresponding to the non-position characteristic. The spatial weight model can determine the information of the position feature and the non-position feature included in the first feature map Cn through loss function calculation and negative feedback calculation in the convolution operation process.

(2) And performing activation calculation on the characteristic diagram 1 by using a Relu activation function to obtain a characteristic diagram 2. The calculation formula is as follows:

wherein,

the characteristic values in the characteristic diagram 2 are represented,

representing the weights of the convolution kernels in the first convolution layer.

(3) And performing convolution calculation for the second time by using the characteristic diagram 2. A characteristic map 3 is obtained. In fig. 3, the minicubes in the signature graph 3 represent: the numerical sequence of the minicubes in the feature map 2 corresponds to the numerical sequence after the convolution operation of the first convolution layer.

Since the number of channels of the feature map obtained by the first convolution operation is 512, the size of the convolution kernel of the second convolution layer is 1 × 1 × 512, and the total number of convolution kernels is 512 (the total number of convolution kernels may be other values, which is only an example here). Similarly, in the second convolutional layer, the weight of the convolutional kernel used for calculating the position feature of the feature map is greater than the weight of the convolutional kernel used for calculating the non-position feature.

(4) And normalizing the characteristic diagram 3 to obtain a characteristic diagram 4. Fig. 4 is a second characteristic diagram of the above embodiment.

(5) And multiplying the feature map 4 by the first feature map Cn to obtain a feature map 5. Fig. 5 is a third characteristic diagram of the above embodiment. In this embodiment, optionally, the spatial weight model may be pre-constructed into a model architecture that obtains the second feature map and performs multiplication operation on the second feature map and the first feature map.

It should be noted that, in this embodiment, performing convolution operation twice by the spatial weight model is only an example, and optionally, the spatial weight model may also be configured to perform convolution operation more times, and the weight of the convolution kernel in each convolution layer for calculating the position feature of the feature map is greater than the weight of the convolution kernel for calculating the non-position feature.

In the method provided by this embodiment, in the convolution operation of the spatial weight model, the weight of the convolution kernel corresponding to the position feature is greater than the weight of the convolution kernel corresponding to the non-position feature, so that the position feature in the output second feature map is enhanced, and the accuracy of detecting the image to be detected can be improved.

Fig. 4 is a schematic structural diagram of an apparatus 400 for processing pictures according to the present application, including:

a receiving unit 401, configured to receive an image to be detected.

An obtaining unit 402, configured to obtain a multi-scale feature map of an image to be detected as a first feature map;

an identifying unit 403, configured to identify, for each first feature map, a location feature and a non-location feature in the first feature map.

A first operation unit 404, configured to perform convolution operation on the first feature map to obtain a second feature map; in the convolution operation, the weight of the convolution kernel corresponding to the position feature is greater than the weight of the convolution kernel corresponding to the non-position feature.

A second operation unit 405, configured to multiply the second feature map and the first feature map to obtain a third feature map.

And a generating unit 406, configured to generate a fourth feature map for detecting the image to be detected according to the third feature map.

Optionally, the specific implementation manner of performing convolution operation on the first feature map by the first operation unit 404 to obtain the second feature map is as follows: and performing convolution operation on the first feature map for multiple times to obtain a second feature map, wherein in any convolution operation, the weight of the convolution kernel corresponding to the position feature is greater than the weight of the convolution kernel corresponding to the non-position feature.

Optionally, in the convolution operation performed on the first feature map by the first operation unit 404, the size of the convolution kernel is 1 × 1 × M, where M is the number of channels of the first feature map.

Optionally, in any convolution operation after the first operation unit 404 performs the convolution operation on the first feature map, the size of the convolution kernel is 1 × 1 × N, and N is the number of convolution kernels of the previous convolution operation.

Optionally, the specific implementation manner of identifying the position feature and the non-position feature in the first feature map by the identifying unit 403 and performing convolution operation on the first feature map by the first operation unit 404 to obtain the second feature map is as follows:

and inputting the first feature map into a pre-constructed space weight model, obtaining a second feature map space weight model output by the space weight model, wherein the second feature map space weight model is used for identifying position features and non-position features in the first feature map, and performing convolution operation on the first feature map, and in the convolution operation, the weight of a convolution kernel corresponding to the position features is greater than the weight of a convolution kernel corresponding to the non-position features.

Optionally, the specific implementation manner of generating, by the generating unit 406, the fourth feature map for detecting the image to be detected according to the third feature map is as follows:

taking the third feature map with the minimum dimension as a first fourth feature map;

performing convolution calculation on the third feature map by using a convolution layer with the size of 1 x 1 aiming at any one of the third feature maps except the third feature map with the minimum scale, and performing feature connection operation on the third feature map and a target fourth feature map subjected to upsampling processing to obtain a fourth feature map corresponding to the third feature map;

the target fourth feature map of the third feature map is a fourth feature map corresponding to the third feature map, in the multi-scale third feature map, the size of which is adjacent to the size of the third feature map and is smaller than the size of the third feature map, and the third feature map and the fourth feature map correspond to each other in a way that the fourth feature map is calculated at least according to the third feature map.

Optionally, the specific implementation manner of the obtaining unit 402 obtaining the multi-scale feature map of the image to be detected is as follows: and performing bottom-up path down-sampling processing on the image to be detected by using the FPN to obtain characteristic maps of the image to be detected in different scales.

The present application further provides an electronic device 500, a schematic structural diagram of which is shown in fig. 5, including: a processor 501 and a memory 502, the memory 502 is used for storing application programs, the processor 501 is used for executing the application programs to realize the picture processing method of the present application, that is, the following steps are executed:

receiving an image to be detected;

The present application also provides a computer-readable storage medium having stored therein instructions, which when run on a computer, cause the computer to perform the method of picture processing of the present application, namely to perform the steps of:

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of image processing, comprising:

receiving an image to be detected;

2. The method of claim 1, wherein the convolving the first feature map to obtain a second feature map comprises:

3. The method according to claim 2, wherein in the convolution operation performed on the first feature map, a convolution kernel size is 1 × 1 × M, where M is the number of channels of the first feature map.

4. The method according to claim 3, wherein in any one of the convolution operations after the convolution operation is performed on the first feature map, a convolution kernel size is 1 x N, and N is the number of convolution kernels of a previous convolution operation.

5. The method according to any one of claims 1-3, wherein said identifying the location features and non-location features in the first feature map and said convolving the first feature map to obtain a second feature map comprises:

6. The method according to claim 1, wherein generating a fourth feature map for detecting the image to be detected according to the third feature map comprises:

7. The method of claim 1, wherein the obtaining of the multi-scale feature map of the image to be detected comprises:

8. An apparatus for image processing, comprising:

the receiving unit is used for receiving an image to be detected;

9. An electronic device, comprising: a processor and a memory for storing a program; the processor is configured to execute the program to implement the method of image processing according to any one of claims 1 to 7.

10. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of image processing according to any one of claims 1-7.