CN112819006B

CN112819006B - Image processing method and device and electronic equipment

Info

Publication number: CN112819006B
Application number: CN202011632945.7A
Authority: CN
Inventors: 陈孝良; 冯大航; 宁海洋
Original assignee: Beijing SoundAI Technology Co Ltd
Current assignee: Beijing SoundAI Technology Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2023-12-22
Anticipated expiration: 2040-12-31
Also published as: CN112819006A

Abstract

The disclosure provides an image processing method, an image processing device and electronic equipment. The image processing method comprises the following steps: acquiring a target image; performing convolution pooling operation on the target image to obtain a deep feature map of the target image; performing average pooling operation on the target image to obtain a shallow feature map of the target image; and generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image. The target feature map generated by the method comprises deep feature information of the target image and shallow feature information of the target image, so that the reliability of image processing can be improved.

Description

Image processing method and device and electronic equipment

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, in particular to an image processing method, an image processing device and electronic equipment.

Background

Convolutional neural networks are a popular algorithm commonly found in image recognition. The convolutional neural network utilizes a convolutional layer and a pooling layer to extract features of the image, models the extracted features through a full-connection layer, and finally identifies the content of the image by using a normalized exponential function (softmax function).

The convolutional neural network carries out convolutional operation on the image, which is beneficial to extracting the image characteristics. However, there is a problem that when the depth of the neural network increases, the local features of the image are focused more, so that the local features ignore much information in the original image, and the reliability of image processing is lower.

Disclosure of Invention

The embodiment of the disclosure provides an image processing method, an image processing device and electronic equipment, which are used for solving the problem of lower reliability of image processing caused by the fact that local features of an image are focused in the existing image processing process and a lot of information in an original image is ignored.

To solve the above problems, the present disclosure is implemented as follows:

in a first aspect, an embodiment of the present disclosure provides an image processing method, including:

acquiring a target image;

performing convolution pooling operation on the target image to obtain a deep feature map of the target image;

performing average pooling operation on the target image to obtain a shallow feature map of the target image;

and generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image.

In a second aspect, embodiments of the present disclosure further provide an electronic device, including:

The acquisition module is used for acquiring a target image;

the first operation module is used for performing convolution pooling operation on the target image to obtain a deep feature map of the target image;

the second operation module is used for carrying out average pooling operation on the target image to obtain a shallow feature map of the target image;

and the generating module is used for generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image.

In a third aspect, the disclosed embodiments also provide an electronic device including a processor, a memory, and a program stored on the memory and executable on the processor, the program implementing the steps of the image processing method as described above when executed by the processor.

In a fourth aspect, the embodiments of the present disclosure also provide a readable storage medium having stored thereon a program which, when executed by a processor, implements the steps of the image processing method applied to an electronic device as described above.

In the embodiment of the disclosure, a target feature map corresponding to a target image is generated based on a deep feature map and a shallow feature map of the target image, wherein the deep feature map is obtained through convolution pooling operation, and the shallow feature map is obtained through average pooling operation. In this way, the generated target feature map not only comprises the deep feature information of the target image, but also retains the shallow feature information of the target image, so that the reliability of image processing can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

FIG. 1 is a flowchart illustrating a method of image processing according to an exemplary embodiment;

FIG. 2 is a schematic diagram illustrating image processing according to an exemplary embodiment;

fig. 3 is a block diagram of an image processing apparatus according to an exemplary embodiment;

fig. 4 is a block diagram of an electronic device shown according to an example embodiment.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Detailed Description

The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.

The terms "first," "second," and the like in this application are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The image processing method of the embodiment of the present disclosure is explained below.

Referring to fig. 1, fig. 1 is a flowchart illustrating an image processing method according to an exemplary embodiment. The image processing method of the embodiment of the disclosure is applied to the electronic equipment. In practical applications, the electronic device may be a mobile phone, a computer, a television, a wearable device, a vehicle-mounted device, or the like.

It should be noted that, the image processing method of the embodiment of the present disclosure may be implemented by a convolutional neural network model installed in an electronic device, or may be implemented by other manners, which is not limited in the embodiment of the present disclosure.

As shown in fig. 1, the image processing method may include the steps of:

in step 101, a target image is acquired.

In a specific implementation, the target image may be an original image to be processed, or may be an image obtained based on the original image.

Such as: considering that the size of the image input by the model is fixed, optionally, the acquiring the target image includes: acquiring an original image; and adjusting the size of the original image to obtain the target image. In this way, the image of the input model can be adapted to the size requirements of the model, so that the reliability of image processing can be improved.

In step 102, a convolution pooling operation is performed on the target image, so as to obtain a deep feature map of the target image.

In a specific implementation, the number of times that the electronic device performs the convolution pooling operation may be greater than or equal to 1, which may be specifically determined according to an actual situation, which is not limited in the embodiment of the present disclosure.

In step 103, an average pooling operation is performed on the target image, so as to obtain a shallow feature map of the target image.

In a specific implementation, the number of times that the electronic device performs the average pooling operation may be greater than or equal to 1, which may be specifically determined according to an actual situation, which is not limited in the embodiment of the present disclosure.

In step 104, a target feature map corresponding to the target image is generated according to the deep feature map of the target image and the shallow feature map of the target image.

In a specific implementation, in one implementation manner, the electronic device may obtain the target feature map corresponding to the target image by stitching the deep feature map of the target image with the shallow feature map of the target image.

In another implementation manner, optionally, the generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image includes:

And combining the deep feature map of the target image with the shallow feature map of the target image by adopting an attention algorithm to obtain a target feature map corresponding to the target image.

In this implementation manner, the electronic device may introduce an attention algorithm, and based on the attention algorithm, fuse the deep feature map according to the target image and the shallow feature map of the target image to obtain a target feature map corresponding to the target image. In this way, compared with the method of obtaining the target feature map by splicing, the method adopts the attention algorithm to organically combine the shallow feature information and the deep feature information, realizes the association between deep features, the association between shallow features and the association between deep features and shallow features, thereby further improving the reliability of image processing.

In an embodiment of the present disclosure, optionally, after the generating the target feature map corresponding to the target image, the method further includes: and processing the original image according to the target feature map corresponding to the target image. In a specific implementation, the processing the original image may include: the image recognition is performed on the original image, or the image segmentation is performed on the original image, which may be specifically determined according to the actual situation, which is not limited in the embodiment of the present disclosure.

According to the image processing method, a target feature map corresponding to a target image is generated based on a deep feature map and a shallow feature map, wherein the deep feature map is obtained through convolution pooling operation, and the shallow feature map is obtained through average pooling operation. In this way, the generated target feature map not only comprises the deep feature information of the target image, but also retains the shallow feature information of the target image, so that the reliability of image processing can be improved.

In an embodiment of the present disclosure, optionally, performing a convolution pooling operation on the target image to obtain a deep feature map of the target image includes:

executing K times of convolution pooling operation to obtain N deep feature graphs output by the K times of convolution pooling operation;

the input of the first convolution pooling operation is the target image, and the input is N deep feature images corresponding to the target image; the input of the (i+1) th convolution pooling operation is the (i) th target feature map corresponding to the target image, and the input is the (N) deep feature maps corresponding to the (i) th target feature map; K. n and i are positive integers;

and performing an average pooling operation on the target image to obtain a shallow feature map of the target image, wherein the method comprises the following steps:

Performing K times of average pooling operation to obtain K first shallow feature images;

the input of the first averaging pooling operation is the target image, and the input is a shallow feature map corresponding to the target image; the input of the (i+1) th averaging pooling operation comprises the (i) th target feature map, and the output comprises a shallow feature map corresponding to the (i) th target feature map;

the generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image includes:

and generating a K-th target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps which are output by the K-th convolution pooling operation, wherein the K second shallow feature maps are determined based on the K first shallow feature maps.

The following specific description of this alternative embodiment is provided as follows:

1) Regarding the K convolution pooling operations.

In specific implementation, N deep feature graphs can be obtained through each convolution pooling operation, N is a positive integer, and a specific value of N can be determined according to actual conditions, which is not limited in the embodiment of the present disclosure. The sizes of the N deep feature maps output by the convolution pooling operations of different times are different, specifically, the size of the N deep feature maps output by the (i+1) th convolution pooling operation is smaller than the size of the N deep feature maps output by the (i) th convolution pooling operation.

Such as: the size of the target image is 128×128 pixels, the size of the N deep feature images output by the first convolution pooling operation is 64×64 pixels, and the size of the N deep feature images output by the first convolution pooling operation is 32×32 pixels.

2) Regarding K averaging pooling operations.

In a first implementation manner, the K first shallow feature maps may be represented as K shallow feature maps output by a kth average pooling operation; in a second implementation manner, the K first shallow feature maps may be represented as K shallow feature maps output by K averaging pooling operations. The implementation of each averaging pooling operation may be different for the K first shallow feature maps of different implementations, as follows:

for the K first shallow feature maps of the first expression form, the number of shallow feature maps obtained by each averaging pooling operation is different, and the number of shallow feature maps obtained by the (i+1) th averaging pooling operation is 1 more than the number of shallow feature maps obtained by the (i) th averaging pooling operation.

Optionally, the input of the (i+1) -th averaging pooling operation may be the (i) -th target feature map and the (i) -th averaging pooled output, and the output may be the shallow feature map corresponding to the (i) -th target feature map and the (i) -th shallow feature map corresponding to the (i) -th averaging pooled output; the K second shallow feature maps are the K first shallow feature maps.

In addition, the size of the shallow feature map obtained by the ith average pooling operation may be the same as the size of the deep feature map obtained by the ith convolution pooling operation, so that the electronic device may directly generate the kth target feature map corresponding to the target image according to the N deep feature maps output by the kth convolution pooling operation and the K shallow feature maps output by the kth average pooling operation.

Such as: assuming that the size of the target image is 128×128 pixels, the sizes of the N deep feature maps output by the first convolution pooling operation are all 64×64 pixels, and the sizes of the N deep feature maps output by the first convolution pooling operation are all 32×32 pixels.

Then, the first averaging operation outputs a shallow feature map 1, and the size of the shallow feature map 1 is 64×64 pixels. The inputs for the second averaging pooling operation are: shallow layer feature map 1, the 1 st target feature map corresponding to the target image, output as: the dimensions of the shallow feature map 1.1 and the shallow feature map 2 corresponding to the 1 st target feature map are 64×64 pixels, and the shallow feature map 1.1 and the shallow feature map 2 correspond to the shallow feature map 1. The inputs for the third averaging pooling operation are: shallow feature map 1.1, shallow feature map 2, and the 2 nd target feature map corresponding to the target image, output as: the dimensions of the shallow feature map 1.1.1 corresponding to the shallow feature map 1.1, the shallow feature map 2.1 corresponding to the shallow feature map 2, and the shallow feature map 3 corresponding to the 2 nd target feature map are 32×32 pixels. In this case, the K first shallow feature maps include: shallow feature map 1.1.1, shallow feature map 2.1 and shallow feature map 3.

And for the K first shallow feature maps of the second expression form, the number of shallow feature maps obtained by each time of the average pooling operation is 1.

Optionally, the input of the (i+1) th averaging pooling operation is the (i) th target feature map, and the output is a shallow feature map corresponding to the (i) th target feature map.

In addition, the size of the shallow layer feature map obtained by the ith average pooling operation may be the same as the size of the deep layer feature map obtained by the ith convolution pooling operation.

Then, the first averaging operation outputs a shallow feature map 1, and the size of the shallow feature map 1 is 64×64 pixels. The inputs for the second averaging pooling operation are: and outputting a 1 st target feature map corresponding to the target image as follows: and the size of the shallow layer characteristic diagram 2 corresponding to the 1 st target characteristic diagram is 64×64 pixels. The inputs for the third averaging pooling operation are: and outputting a 2 nd target feature map corresponding to the target image as follows: and the size of the shallow layer characteristic diagram 3 corresponding to the 2 nd target characteristic diagram is 32 x 32 pixel points. In this case, the K first shallow feature maps include: shallow feature map 1, shallow feature map 2, and shallow feature map 3.

3) And acquiring a Kth target feature map corresponding to the target image.

In a specific implementation, the size of the K second shallow feature maps may be the same as the N deep feature maps output by the kth convolution pooling operation. In this way, the electronic device can directly obtain the kth target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation.

For the K first shallow feature maps of the first implementation manner, the K first shallow feature maps may be directly determined as the K second shallow feature maps, that is, the K second shallow feature maps are the K first shallow feature maps.

As can be seen from the foregoing, in the embodiment of the present disclosure, optionally, in the case where the K first shallow feature maps are K shallow feature maps output by the kth averaging pooling operation, the input of the (i+1) th averaging pooling operation is the ith target feature map and the ith averaging pooled output, and the output is a shallow feature map corresponding to the ith target feature map and an i shallow feature map corresponding to the ith averaging pooled output; the K second shallow feature maps are the K first shallow feature maps. In this embodiment, the electronic device may directly obtain the kth target feature map corresponding to the target image based on the N deep feature maps output by the kth convolution pooling operation and the K shallow feature maps output by the kth average pooling operation, without processing the feature maps, so that the rate of image processing may be improved.

For the K first shallow feature maps of the second implementation manner, the electronic device may obtain the K second shallow feature maps by processing the K first shallow feature maps. Optionally, after the performing the K-time averaging pooling operation to obtain K first shallow feature maps, before generating the kth target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation, the method further includes:

the size of K-1 shallow feature images output by the previous K-1 average pooling operation is adjusted, and the adjusted size of the K-1 shallow feature images is the same as the size of the shallow feature images output by the Kth average pooling operation;

and determining the K-1 shallow feature graphs after adjustment and the shallow feature graphs output by the Kth average pooling operation as K second shallow feature graphs.

In specific implementation, the electronic device may perform a resizing operation on the K-1 shallow feature maps output by the previous K-1 averaging pooling operation, to obtain K-1 second shallow feature maps. It should be noted that, the restore operation does not lose original image information of the image, but only changes the size of the image. In addition, the shallow feature map output by the Kth averaging pooling operation is determined as a second shallow feature map. So far, K second shallow feature maps are obtained.

As can be seen from the foregoing, in the embodiment of the present disclosure, optionally, in a case where the K first shallow feature maps are K shallow feature maps output by the K averaging pooling operations, an input of the i+1th averaging pooling operation is the i target feature map, and output is a shallow feature map corresponding to the i target feature map;

after the K-time average pooling operation is performed to obtain K first shallow feature maps, before generating a K-th target feature map corresponding to a target image according to the N deep feature maps and the K second shallow feature maps output by the K-th convolution pooling operation, the method further includes:

In this optional embodiment, the input and output of each time of the average pooling operation are 1 shallow feature map, so that the processing of average pooling can be reduced, and the running burden of the electronic device can be further reduced.

In the embodiment of the present disclosure, in the first implementation manner, the electronic device may obtain a kth target feature map corresponding to the target image by concatenating the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation. In the second embodiment, the electronic device may introduce an attention algorithm, and based on the attention algorithm, fuse the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation, to obtain a kth target feature map corresponding to the target image.

The implementation of the second embodiment will be described below:

optionally, the generating the kth target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation includes:

acquiring N attention weight values corresponding to N deep feature maps output by the Kth convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps;

and according to the N attention weight values and the K attention weight values, fusing the N deep feature images and the K second shallow feature images which are output by the Kth convolution pooling operation to obtain a Kth target feature image corresponding to the target image.

In this alternative embodiment, the kth target feature map corresponding to the target image may be obtained by the following formula:

wherein C is _K Representing the Kth target feature map corresponding to the target image, C _K Can be understood as the Target of attention (Target) output; FM (frequency modulation) _X Representing the number of input feature maps, in this alternative embodiment, the input feature maps are the N deep feature maps and the K second shallow feature maps output by the K-th convolution pooling operation, thus FM _X ＝K+M，FM _X The number of Source of attention (Source) inputs can be understood; h is a _j Represents the j-th feature map of the input, h _j A Source input that can be understood as attention; a, a _Kj The attention weight representing the attention weight of the jth feature map when the kth target feature map is output may also be referred to as an attention distribution coefficient.

In the optional embodiment, the shallow feature information and the deep feature information are organically combined by adopting the attention algorithm, so that the correlation among deep features, the correlation among shallow features and the correlation among deep features are realized, and the reliability of image processing can be further improved.

Optionally, the obtaining N attention weight values corresponding to the N deep feature maps output by the kth convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps includes:

Obtaining N feature vectors corresponding to N deep feature maps output by the Kth convolution pooling operation and K feature vectors corresponding to the K second shallow feature maps;

multiplying each feature vector in the N feature vectors with an attention conversion vector to obtain N attention weight values corresponding to N deep feature graphs output by the K-th convolution pooling operation;

and multiplying each of the K feature vectors by an attention conversion vector to obtain K attention weight values corresponding to the K second shallow feature maps.

In this alternative embodiment, the attention weight value may be obtained by the following formula:

a _Kj ＝m _K w _j

wherein w is _j The representative attention conversion vector is a vector which needs to be trained by the convolutional neural network model; m is m _K Feature vectors representing feature maps. In specific implementations, the feature vectors of the feature map may be obtained by taking the global average value of the feature map, but is not limited thereto.

It should be noted that, the various optional implementations described in the embodiments of the present disclosure may be implemented in combination with each other without collision with each other, or may be implemented separately, which is not limited to the embodiments of the present disclosure.

For ease of understanding, examples are illustrated below:

in order to enable the image to retain more shallow feature information, the embodiment may combine the shallow feature information and the deep feature information of the image by adopting a attention algorithm.

In this embodiment, the convolved feature map is used to perform layer-by-layer attention calculation, and feature information in the original map is transferred to the feature map through a layer-by-layer attention mechanism. The deep layer feature map and the shallow layer feature map are taken as the Source of attention, and the target feature map obtained after attention calculation is taken as the target of attention.

The image processing method of the present embodiment may include the steps of:

firstly, the pictures are restored to 256×256 sizes, and are input into a convolutional neural network.

And step two, performing convolution pooling by using a convolution kernel function, and performing feature extraction on the original image to obtain a feature map.

And thirdly, carrying out average pooling on the characteristic information of each layer, reducing the size to be consistent with the current characteristic diagram, and splicing the characteristic diagram with the current characteristic diagram.

And step four, calculating the attention weight of the feature, and carrying out a weighting algorithm of the upper and lower layer features on the feature map.

And fifthly, calculating the attention weight of the local feature, and weighting the inter-dependence of the inside of the feature map.

And step six, repeating the step two to the step five until the convolution layer is finished.

And step seven, updating the network weight by using the cross entropy loss (loss) as a network loss function.

The image processing procedure can be seen in fig. 2.

The attention calculating method comprises the following steps:

wherein C is _K For the output of the ith convolution layer, FMx represents the number of input feature maps, a _Kj Attention distribution coefficient representing the jth feature in Source input at the time of Target output of the kth feature, and h _j Then it is the convolution characteristic of the j-th input.

a _Kj The calculation method is as follows:

a _Kj ＝m _K w _j

w _j the length generated after the average value of the total office of each layer of the current layer characteristic diagram is calculated is d _j 128 x (d1+3) in fig. 2, and each layer of feature map forms a vector with a length d1+3 after the global average value is calculated; m is m _K The attention conversion vector is a vector that needs training.

The embodiment of the application can optimize the optical character recognition (Optical Character Recognition, OCR) technology by combining the adjusted attention mechanism and the convolutional neural network, thereby improving the capability of extracting the model characteristics and improving the character recognition accuracy. Meanwhile, the number of layers of the utilized neural network is smaller, and the operation speed is higher.

Specifically, the attention mechanism is applied to the OCR technology, and the text semantics and the picture recognition are integrated by utilizing the feature association capability of the attention mechanism, so that the performance of the OCR technology is improved. Aiming at the specificity of the image recognition field, the calculation method of the attention mechanism is modified, the local data are fused by using a blocking method, and then the attention calculation is carried out.

Referring to fig. 3, fig. 3 is a block diagram of an image processing apparatus according to an exemplary embodiment. As shown in fig. 3, the image processing apparatus 300 includes:

an acquisition module 301, configured to acquire a target image;

a first operation module 302, configured to perform a convolution pooling operation on the target image, so as to obtain a deep feature map of the target image;

a second operation module 303, configured to perform an average pooling operation on the target image, so as to obtain a shallow feature map of the target image;

the generating module 304 is configured to generate a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image.

Optionally, the generating module 304 is specifically configured to:

Optionally, the first operation module 301 is specifically configured to perform K convolution pooling operations to obtain N deep feature graphs output by the kth convolution pooling operation;

The second operation module 302 is specifically configured to perform K times of averaging pooling operations to obtain K first shallow feature maps;

the generating module 303 is specifically configured to generate a kth target feature map corresponding to a target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation, where the K second shallow feature maps are determined based on the K first shallow feature maps.

Optionally, in the case that the K first shallow feature maps are K shallow feature maps output by the kth averaging pooling operation, the input of the (i+1) th averaging pooling operation is the ith target feature map and the ith averaging pooled output, and the output is a shallow feature map corresponding to the ith target feature map and an i shallow feature map corresponding to the ith averaging pooled output; the K second shallow feature maps are the K first shallow feature maps.

Optionally, in the case that the K first shallow feature maps are K shallow feature maps output by the K-time averaging pooling operation, an input of the (i+1) -th averaging pooling operation is the (i) -th target feature map, and the input is a shallow feature map corresponding to the (i) -th target feature map;

the image processing apparatus 300 further includes:

the adjusting module is used for adjusting the sizes of K-1 shallow feature images output by the previous K-1 average pooling operation, and the adjusted sizes of the K-1 shallow feature images are the same as the sizes of the shallow feature images output by the Kth average pooling operation;

and the determining module is used for determining the K second shallow feature images as the K-1 shallow feature images after adjustment and the shallow feature images output by the Kth average pooling operation.

Optionally, the generating module 303 includes:

the first acquisition sub-module is used for acquiring N attention weight values corresponding to the N deep feature maps output by the K-th convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps;

and the second acquisition submodule is used for fusing the N deep feature images and the K second shallow feature images which are output by the Kth convolution pooling operation according to the N attention weight values and the K attention weight values to obtain a Kth target feature image corresponding to the target image.

Optionally, the first obtaining sub-module includes:

a first obtaining unit, configured to obtain N feature vectors corresponding to the N deep feature maps output by the kth convolution pooling operation, and K feature vectors corresponding to the K second shallow feature maps;

the second acquisition unit is used for multiplying each of the N feature vectors by an attention conversion vector to obtain N attention weight values corresponding to N deep feature graphs output by the K-th convolution pooling operation;

and the third acquisition unit is used for multiplying each of the K eigenvectors by the attention conversion vector to obtain K attention weight values corresponding to the K second shallow eigenvectors.

The image processing apparatus 300 can implement each process in the embodiments of the method of the present disclosure and achieve the same beneficial effects, and in order to avoid repetition, a detailed description is omitted here.

Referring to fig. 4, fig. 4 is a block diagram of an electronic device shown according to an exemplary embodiment. As shown in fig. 4, the electronic device 400 includes: a processor 401, a memory 402, a user interface 403, a transceiver 404 and a bus interface.

Wherein, in the embodiment of the present disclosure, the electronic device 400 further includes: a program stored on the memory 402 and executable on the processor 401, which when executed by the processor 401, performs the steps of:

Acquiring a target image;

Optionally, the program when executed by the processor 401 implements the steps of:

the program when executed by the processor 401 implements the steps of:

In fig. 4, a bus architecture may comprise any number of interconnected buses and bridges, with one or more processors, represented in particular by processor 401, and various circuits of memory, represented by memory 402, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 404 may be a number of elements, i.e. comprising a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The user interface 403 may also be an interface capable of interfacing with an inscribed desired device for a different user device, including but not limited to a keypad, display, speaker, microphone, joystick, etc.

The processor 401 is responsible for managing the bus architecture and general processing, and the memory 402 may store data used by the processor 2601 in performing operations.

The electronic device 400 can implement the respective processes in the above-described method embodiments, and in order to avoid repetition, a description thereof is omitted here.

The embodiment of the present disclosure further provides a readable storage medium, on which a program is stored, where the program, when executed by a processor, implements each process of the foregoing image processing method embodiment, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. Wherein the readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The embodiments of the present disclosure have been described above with reference to the accompanying drawings, but the present disclosure is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the disclosure and the scope of the claims, which are all within the protection of the present disclosure.

Claims

1. An image processing method, the method comprising:

acquiring a target image;

generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image;

combining the deep feature map of the target image with the shallow feature map of the target image by adopting an attention algorithm to obtain a target feature map corresponding to the target image;

and performing convolution pooling operation on the target image to obtain a deep feature map of the target image, wherein the deep feature map comprises the following components:

2. The method according to claim 1, wherein, in the case where the K first shallow feature maps are K shallow feature maps output by a kth averaging pooling operation, an input of an (i+1) th averaging pooling operation is the ith target feature map and an ith averaging pooled output, and an output is a shallow feature map corresponding to the ith target feature map and an i shallow feature map corresponding to the ith averaging pooled output; the K second shallow feature maps are the K first shallow feature maps.

3. The method according to claim 1, wherein, in the case that the K first shallow feature maps are K shallow feature maps output by K averaging pooling operations, an input of an i+1th averaging pooling operation is the i-th target feature map, and an output is a shallow feature map corresponding to the i-th target feature map;

4. The method of claim 1, wherein generating a kth target feature map corresponding to a target image from the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation comprises:

5. The method of claim 4, wherein the obtaining N attention weight values corresponding to the N deep feature maps output by the kth convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps comprises:

6. An image processing apparatus, characterized in that the image processing apparatus comprises:

the acquisition module is used for acquiring a target image;

the generation module is used for generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image;

the generating module is specifically configured to:

the first operation module is specifically configured to perform K convolution pooling operations to obtain N deep feature graphs output by the kth convolution pooling operations;

The second operation module is specifically configured to perform K times of average pooling operations to obtain K first shallow feature graphs;

7. An electronic device comprising a processor, a memory and a program stored on the memory and executable on the processor, the program when executed by the processor implementing the steps of the image processing method according to any one of claims 1 to 5.

8. A readable storage medium, characterized in that the readable storage medium has stored thereon a program which, when executed by a processor, implements the steps of the image processing method according to any one of claims 1 to 5.