CN112819006B - Image processing method and device and electronic equipment - Google Patents

Image processing method and device and electronic equipment Download PDF

Info

Publication number
CN112819006B
CN112819006B CN202011632945.7A CN202011632945A CN112819006B CN 112819006 B CN112819006 B CN 112819006B CN 202011632945 A CN202011632945 A CN 202011632945A CN 112819006 B CN112819006 B CN 112819006B
Authority
CN
China
Prior art keywords
target image
feature map
shallow
target
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011632945.7A
Other languages
Chinese (zh)
Other versions
CN112819006A (en
Inventor
陈孝良
冯大航
宁海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing SoundAI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing SoundAI Technology Co Ltd filed Critical Beijing SoundAI Technology Co Ltd
Priority to CN202011632945.7A priority Critical patent/CN112819006B/en
Publication of CN112819006A publication Critical patent/CN112819006A/en
Application granted granted Critical
Publication of CN112819006B publication Critical patent/CN112819006B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The disclosure provides an image processing method, an image processing device and electronic equipment. The image processing method comprises the following steps: acquiring a target image; performing convolution pooling operation on the target image to obtain a deep feature map of the target image; performing average pooling operation on the target image to obtain a shallow feature map of the target image; and generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image. The target feature map generated by the method comprises deep feature information of the target image and shallow feature information of the target image, so that the reliability of image processing can be improved.

Description

Image processing method and device and electronic equipment
Technical Field
The embodiment of the disclosure relates to the technical field of image processing, in particular to an image processing method, an image processing device and electronic equipment.
Background
Convolutional neural networks are a popular algorithm commonly found in image recognition. The convolutional neural network utilizes a convolutional layer and a pooling layer to extract features of the image, models the extracted features through a full-connection layer, and finally identifies the content of the image by using a normalized exponential function (softmax function).
The convolutional neural network carries out convolutional operation on the image, which is beneficial to extracting the image characteristics. However, there is a problem that when the depth of the neural network increases, the local features of the image are focused more, so that the local features ignore much information in the original image, and the reliability of image processing is lower.
Disclosure of Invention
The embodiment of the disclosure provides an image processing method, an image processing device and electronic equipment, which are used for solving the problem of lower reliability of image processing caused by the fact that local features of an image are focused in the existing image processing process and a lot of information in an original image is ignored.
To solve the above problems, the present disclosure is implemented as follows:
in a first aspect, an embodiment of the present disclosure provides an image processing method, including:
acquiring a target image;
performing convolution pooling operation on the target image to obtain a deep feature map of the target image;
performing average pooling operation on the target image to obtain a shallow feature map of the target image;
and generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image.
In a second aspect, embodiments of the present disclosure further provide an electronic device, including:
The acquisition module is used for acquiring a target image;
the first operation module is used for performing convolution pooling operation on the target image to obtain a deep feature map of the target image;
the second operation module is used for carrying out average pooling operation on the target image to obtain a shallow feature map of the target image;
and the generating module is used for generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image.
In a third aspect, the disclosed embodiments also provide an electronic device including a processor, a memory, and a program stored on the memory and executable on the processor, the program implementing the steps of the image processing method as described above when executed by the processor.
In a fourth aspect, the embodiments of the present disclosure also provide a readable storage medium having stored thereon a program which, when executed by a processor, implements the steps of the image processing method applied to an electronic device as described above.
In the embodiment of the disclosure, a target feature map corresponding to a target image is generated based on a deep feature map and a shallow feature map of the target image, wherein the deep feature map is obtained through convolution pooling operation, and the shallow feature map is obtained through average pooling operation. In this way, the generated target feature map not only comprises the deep feature information of the target image, but also retains the shallow feature information of the target image, so that the reliability of image processing can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
FIG. 1 is a flowchart illustrating a method of image processing according to an exemplary embodiment;
FIG. 2 is a schematic diagram illustrating image processing according to an exemplary embodiment;
fig. 3 is a block diagram of an image processing apparatus according to an exemplary embodiment;
fig. 4 is a block diagram of an electronic device shown according to an example embodiment.
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are some embodiments of the present disclosure, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without inventive effort, based on the embodiments in this disclosure are intended to be within the scope of this disclosure.
The terms "first," "second," and the like in this application are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The image processing method of the embodiment of the present disclosure is explained below.
Referring to fig. 1, fig. 1 is a flowchart illustrating an image processing method according to an exemplary embodiment. The image processing method of the embodiment of the disclosure is applied to the electronic equipment. In practical applications, the electronic device may be a mobile phone, a computer, a television, a wearable device, a vehicle-mounted device, or the like.
It should be noted that, the image processing method of the embodiment of the present disclosure may be implemented by a convolutional neural network model installed in an electronic device, or may be implemented by other manners, which is not limited in the embodiment of the present disclosure.
As shown in fig. 1, the image processing method may include the steps of:
in step 101, a target image is acquired.
In a specific implementation, the target image may be an original image to be processed, or may be an image obtained based on the original image.
Such as: considering that the size of the image input by the model is fixed, optionally, the acquiring the target image includes: acquiring an original image; and adjusting the size of the original image to obtain the target image. In this way, the image of the input model can be adapted to the size requirements of the model, so that the reliability of image processing can be improved.
In step 102, a convolution pooling operation is performed on the target image, so as to obtain a deep feature map of the target image.
In a specific implementation, the number of times that the electronic device performs the convolution pooling operation may be greater than or equal to 1, which may be specifically determined according to an actual situation, which is not limited in the embodiment of the present disclosure.
In step 103, an average pooling operation is performed on the target image, so as to obtain a shallow feature map of the target image.
In a specific implementation, the number of times that the electronic device performs the average pooling operation may be greater than or equal to 1, which may be specifically determined according to an actual situation, which is not limited in the embodiment of the present disclosure.
In step 104, a target feature map corresponding to the target image is generated according to the deep feature map of the target image and the shallow feature map of the target image.
In a specific implementation, in one implementation manner, the electronic device may obtain the target feature map corresponding to the target image by stitching the deep feature map of the target image with the shallow feature map of the target image.
In another implementation manner, optionally, the generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image includes:
And combining the deep feature map of the target image with the shallow feature map of the target image by adopting an attention algorithm to obtain a target feature map corresponding to the target image.
In this implementation manner, the electronic device may introduce an attention algorithm, and based on the attention algorithm, fuse the deep feature map according to the target image and the shallow feature map of the target image to obtain a target feature map corresponding to the target image. In this way, compared with the method of obtaining the target feature map by splicing, the method adopts the attention algorithm to organically combine the shallow feature information and the deep feature information, realizes the association between deep features, the association between shallow features and the association between deep features and shallow features, thereby further improving the reliability of image processing.
In an embodiment of the present disclosure, optionally, after the generating the target feature map corresponding to the target image, the method further includes: and processing the original image according to the target feature map corresponding to the target image. In a specific implementation, the processing the original image may include: the image recognition is performed on the original image, or the image segmentation is performed on the original image, which may be specifically determined according to the actual situation, which is not limited in the embodiment of the present disclosure.
According to the image processing method, a target feature map corresponding to a target image is generated based on a deep feature map and a shallow feature map, wherein the deep feature map is obtained through convolution pooling operation, and the shallow feature map is obtained through average pooling operation. In this way, the generated target feature map not only comprises the deep feature information of the target image, but also retains the shallow feature information of the target image, so that the reliability of image processing can be improved.
In an embodiment of the present disclosure, optionally, performing a convolution pooling operation on the target image to obtain a deep feature map of the target image includes:
executing K times of convolution pooling operation to obtain N deep feature graphs output by the K times of convolution pooling operation;
the input of the first convolution pooling operation is the target image, and the input is N deep feature images corresponding to the target image; the input of the (i+1) th convolution pooling operation is the (i) th target feature map corresponding to the target image, and the input is the (N) deep feature maps corresponding to the (i) th target feature map; K. n and i are positive integers;
and performing an average pooling operation on the target image to obtain a shallow feature map of the target image, wherein the method comprises the following steps:
Performing K times of average pooling operation to obtain K first shallow feature images;
the input of the first averaging pooling operation is the target image, and the input is a shallow feature map corresponding to the target image; the input of the (i+1) th averaging pooling operation comprises the (i) th target feature map, and the output comprises a shallow feature map corresponding to the (i) th target feature map;
the generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image includes:
and generating a K-th target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps which are output by the K-th convolution pooling operation, wherein the K second shallow feature maps are determined based on the K first shallow feature maps.
The following specific description of this alternative embodiment is provided as follows:
1) Regarding the K convolution pooling operations.
In specific implementation, N deep feature graphs can be obtained through each convolution pooling operation, N is a positive integer, and a specific value of N can be determined according to actual conditions, which is not limited in the embodiment of the present disclosure. The sizes of the N deep feature maps output by the convolution pooling operations of different times are different, specifically, the size of the N deep feature maps output by the (i+1) th convolution pooling operation is smaller than the size of the N deep feature maps output by the (i) th convolution pooling operation.
Such as: the size of the target image is 128×128 pixels, the size of the N deep feature images output by the first convolution pooling operation is 64×64 pixels, and the size of the N deep feature images output by the first convolution pooling operation is 32×32 pixels.
2) Regarding K averaging pooling operations.
In a first implementation manner, the K first shallow feature maps may be represented as K shallow feature maps output by a kth average pooling operation; in a second implementation manner, the K first shallow feature maps may be represented as K shallow feature maps output by K averaging pooling operations. The implementation of each averaging pooling operation may be different for the K first shallow feature maps of different implementations, as follows:
for the K first shallow feature maps of the first expression form, the number of shallow feature maps obtained by each averaging pooling operation is different, and the number of shallow feature maps obtained by the (i+1) th averaging pooling operation is 1 more than the number of shallow feature maps obtained by the (i) th averaging pooling operation.
Optionally, the input of the (i+1) -th averaging pooling operation may be the (i) -th target feature map and the (i) -th averaging pooled output, and the output may be the shallow feature map corresponding to the (i) -th target feature map and the (i) -th shallow feature map corresponding to the (i) -th averaging pooled output; the K second shallow feature maps are the K first shallow feature maps.
In addition, the size of the shallow feature map obtained by the ith average pooling operation may be the same as the size of the deep feature map obtained by the ith convolution pooling operation, so that the electronic device may directly generate the kth target feature map corresponding to the target image according to the N deep feature maps output by the kth convolution pooling operation and the K shallow feature maps output by the kth average pooling operation.
Such as: assuming that the size of the target image is 128×128 pixels, the sizes of the N deep feature maps output by the first convolution pooling operation are all 64×64 pixels, and the sizes of the N deep feature maps output by the first convolution pooling operation are all 32×32 pixels.
Then, the first averaging operation outputs a shallow feature map 1, and the size of the shallow feature map 1 is 64×64 pixels. The inputs for the second averaging pooling operation are: shallow layer feature map 1, the 1 st target feature map corresponding to the target image, output as: the dimensions of the shallow feature map 1.1 and the shallow feature map 2 corresponding to the 1 st target feature map are 64×64 pixels, and the shallow feature map 1.1 and the shallow feature map 2 correspond to the shallow feature map 1. The inputs for the third averaging pooling operation are: shallow feature map 1.1, shallow feature map 2, and the 2 nd target feature map corresponding to the target image, output as: the dimensions of the shallow feature map 1.1.1 corresponding to the shallow feature map 1.1, the shallow feature map 2.1 corresponding to the shallow feature map 2, and the shallow feature map 3 corresponding to the 2 nd target feature map are 32×32 pixels. In this case, the K first shallow feature maps include: shallow feature map 1.1.1, shallow feature map 2.1 and shallow feature map 3.
And for the K first shallow feature maps of the second expression form, the number of shallow feature maps obtained by each time of the average pooling operation is 1.
Optionally, the input of the (i+1) th averaging pooling operation is the (i) th target feature map, and the output is a shallow feature map corresponding to the (i) th target feature map.
In addition, the size of the shallow layer feature map obtained by the ith average pooling operation may be the same as the size of the deep layer feature map obtained by the ith convolution pooling operation.
Such as: assuming that the size of the target image is 128×128 pixels, the sizes of the N deep feature maps output by the first convolution pooling operation are all 64×64 pixels, and the sizes of the N deep feature maps output by the first convolution pooling operation are all 32×32 pixels.
Then, the first averaging operation outputs a shallow feature map 1, and the size of the shallow feature map 1 is 64×64 pixels. The inputs for the second averaging pooling operation are: and outputting a 1 st target feature map corresponding to the target image as follows: and the size of the shallow layer characteristic diagram 2 corresponding to the 1 st target characteristic diagram is 64×64 pixels. The inputs for the third averaging pooling operation are: and outputting a 2 nd target feature map corresponding to the target image as follows: and the size of the shallow layer characteristic diagram 3 corresponding to the 2 nd target characteristic diagram is 32 x 32 pixel points. In this case, the K first shallow feature maps include: shallow feature map 1, shallow feature map 2, and shallow feature map 3.
3) And acquiring a Kth target feature map corresponding to the target image.
In a specific implementation, the size of the K second shallow feature maps may be the same as the N deep feature maps output by the kth convolution pooling operation. In this way, the electronic device can directly obtain the kth target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation.
For the K first shallow feature maps of the first implementation manner, the K first shallow feature maps may be directly determined as the K second shallow feature maps, that is, the K second shallow feature maps are the K first shallow feature maps.
As can be seen from the foregoing, in the embodiment of the present disclosure, optionally, in the case where the K first shallow feature maps are K shallow feature maps output by the kth averaging pooling operation, the input of the (i+1) th averaging pooling operation is the ith target feature map and the ith averaging pooled output, and the output is a shallow feature map corresponding to the ith target feature map and an i shallow feature map corresponding to the ith averaging pooled output; the K second shallow feature maps are the K first shallow feature maps. In this embodiment, the electronic device may directly obtain the kth target feature map corresponding to the target image based on the N deep feature maps output by the kth convolution pooling operation and the K shallow feature maps output by the kth average pooling operation, without processing the feature maps, so that the rate of image processing may be improved.
For the K first shallow feature maps of the second implementation manner, the electronic device may obtain the K second shallow feature maps by processing the K first shallow feature maps. Optionally, after the performing the K-time averaging pooling operation to obtain K first shallow feature maps, before generating the kth target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation, the method further includes:
the size of K-1 shallow feature images output by the previous K-1 average pooling operation is adjusted, and the adjusted size of the K-1 shallow feature images is the same as the size of the shallow feature images output by the Kth average pooling operation;
and determining the K-1 shallow feature graphs after adjustment and the shallow feature graphs output by the Kth average pooling operation as K second shallow feature graphs.
In specific implementation, the electronic device may perform a resizing operation on the K-1 shallow feature maps output by the previous K-1 averaging pooling operation, to obtain K-1 second shallow feature maps. It should be noted that, the restore operation does not lose original image information of the image, but only changes the size of the image. In addition, the shallow feature map output by the Kth averaging pooling operation is determined as a second shallow feature map. So far, K second shallow feature maps are obtained.
As can be seen from the foregoing, in the embodiment of the present disclosure, optionally, in a case where the K first shallow feature maps are K shallow feature maps output by the K averaging pooling operations, an input of the i+1th averaging pooling operation is the i target feature map, and output is a shallow feature map corresponding to the i target feature map;
after the K-time average pooling operation is performed to obtain K first shallow feature maps, before generating a K-th target feature map corresponding to a target image according to the N deep feature maps and the K second shallow feature maps output by the K-th convolution pooling operation, the method further includes:
the size of K-1 shallow feature images output by the previous K-1 average pooling operation is adjusted, and the adjusted size of the K-1 shallow feature images is the same as the size of the shallow feature images output by the Kth average pooling operation;
and determining the K-1 shallow feature graphs after adjustment and the shallow feature graphs output by the Kth average pooling operation as K second shallow feature graphs.
In this optional embodiment, the input and output of each time of the average pooling operation are 1 shallow feature map, so that the processing of average pooling can be reduced, and the running burden of the electronic device can be further reduced.
In the embodiment of the present disclosure, in the first implementation manner, the electronic device may obtain a kth target feature map corresponding to the target image by concatenating the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation. In the second embodiment, the electronic device may introduce an attention algorithm, and based on the attention algorithm, fuse the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation, to obtain a kth target feature map corresponding to the target image.
The implementation of the second embodiment will be described below:
optionally, the generating the kth target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation includes:
acquiring N attention weight values corresponding to N deep feature maps output by the Kth convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps;
and according to the N attention weight values and the K attention weight values, fusing the N deep feature images and the K second shallow feature images which are output by the Kth convolution pooling operation to obtain a Kth target feature image corresponding to the target image.
In this alternative embodiment, the kth target feature map corresponding to the target image may be obtained by the following formula:
wherein C is K Representing the Kth target feature map corresponding to the target image, C K Can be understood as the Target of attention (Target) output; FM (frequency modulation) X Representing the number of input feature maps, in this alternative embodiment, the input feature maps are the N deep feature maps and the K second shallow feature maps output by the K-th convolution pooling operation, thus FM X =K+M,FM X The number of Source of attention (Source) inputs can be understood; h is a j Represents the j-th feature map of the input, h j A Source input that can be understood as attention; a, a Kj The attention weight representing the attention weight of the jth feature map when the kth target feature map is output may also be referred to as an attention distribution coefficient.
In the optional embodiment, the shallow feature information and the deep feature information are organically combined by adopting the attention algorithm, so that the correlation among deep features, the correlation among shallow features and the correlation among deep features are realized, and the reliability of image processing can be further improved.
Optionally, the obtaining N attention weight values corresponding to the N deep feature maps output by the kth convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps includes:
Obtaining N feature vectors corresponding to N deep feature maps output by the Kth convolution pooling operation and K feature vectors corresponding to the K second shallow feature maps;
multiplying each feature vector in the N feature vectors with an attention conversion vector to obtain N attention weight values corresponding to N deep feature graphs output by the K-th convolution pooling operation;
and multiplying each of the K feature vectors by an attention conversion vector to obtain K attention weight values corresponding to the K second shallow feature maps.
In this alternative embodiment, the attention weight value may be obtained by the following formula:
a Kj =m K w j
wherein w is j The representative attention conversion vector is a vector which needs to be trained by the convolutional neural network model; m is m K Feature vectors representing feature maps. In specific implementations, the feature vectors of the feature map may be obtained by taking the global average value of the feature map, but is not limited thereto.
It should be noted that, the various optional implementations described in the embodiments of the present disclosure may be implemented in combination with each other without collision with each other, or may be implemented separately, which is not limited to the embodiments of the present disclosure.
For ease of understanding, examples are illustrated below:
in order to enable the image to retain more shallow feature information, the embodiment may combine the shallow feature information and the deep feature information of the image by adopting a attention algorithm.
In this embodiment, the convolved feature map is used to perform layer-by-layer attention calculation, and feature information in the original map is transferred to the feature map through a layer-by-layer attention mechanism. The deep layer feature map and the shallow layer feature map are taken as the Source of attention, and the target feature map obtained after attention calculation is taken as the target of attention.
The image processing method of the present embodiment may include the steps of:
firstly, the pictures are restored to 256×256 sizes, and are input into a convolutional neural network.
And step two, performing convolution pooling by using a convolution kernel function, and performing feature extraction on the original image to obtain a feature map.
And thirdly, carrying out average pooling on the characteristic information of each layer, reducing the size to be consistent with the current characteristic diagram, and splicing the characteristic diagram with the current characteristic diagram.
And step four, calculating the attention weight of the feature, and carrying out a weighting algorithm of the upper and lower layer features on the feature map.
And fifthly, calculating the attention weight of the local feature, and weighting the inter-dependence of the inside of the feature map.
And step six, repeating the step two to the step five until the convolution layer is finished.
And step seven, updating the network weight by using the cross entropy loss (loss) as a network loss function.
The image processing procedure can be seen in fig. 2.
The attention calculating method comprises the following steps:
wherein C is K For the output of the ith convolution layer, FMx represents the number of input feature maps, a Kj Attention distribution coefficient representing the jth feature in Source input at the time of Target output of the kth feature, and h j Then it is the convolution characteristic of the j-th input.
a Kj The calculation method is as follows:
a Kj =m K w j
w j the length generated after the average value of the total office of each layer of the current layer characteristic diagram is calculated is d j 128 x (d1+3) in fig. 2, and each layer of feature map forms a vector with a length d1+3 after the global average value is calculated; m is m K The attention conversion vector is a vector that needs training.
The embodiment of the application can optimize the optical character recognition (Optical Character Recognition, OCR) technology by combining the adjusted attention mechanism and the convolutional neural network, thereby improving the capability of extracting the model characteristics and improving the character recognition accuracy. Meanwhile, the number of layers of the utilized neural network is smaller, and the operation speed is higher.
Specifically, the attention mechanism is applied to the OCR technology, and the text semantics and the picture recognition are integrated by utilizing the feature association capability of the attention mechanism, so that the performance of the OCR technology is improved. Aiming at the specificity of the image recognition field, the calculation method of the attention mechanism is modified, the local data are fused by using a blocking method, and then the attention calculation is carried out.
Referring to fig. 3, fig. 3 is a block diagram of an image processing apparatus according to an exemplary embodiment. As shown in fig. 3, the image processing apparatus 300 includes:
an acquisition module 301, configured to acquire a target image;
a first operation module 302, configured to perform a convolution pooling operation on the target image, so as to obtain a deep feature map of the target image;
a second operation module 303, configured to perform an average pooling operation on the target image, so as to obtain a shallow feature map of the target image;
the generating module 304 is configured to generate a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image.
Optionally, the generating module 304 is specifically configured to:
and combining the deep feature map of the target image with the shallow feature map of the target image by adopting an attention algorithm to obtain a target feature map corresponding to the target image.
Optionally, the first operation module 301 is specifically configured to perform K convolution pooling operations to obtain N deep feature graphs output by the kth convolution pooling operation;
the input of the first convolution pooling operation is the target image, and the input is N deep feature images corresponding to the target image; the input of the (i+1) th convolution pooling operation is the (i) th target feature map corresponding to the target image, and the input is the (N) deep feature maps corresponding to the (i) th target feature map; K. n and i are positive integers;
The second operation module 302 is specifically configured to perform K times of averaging pooling operations to obtain K first shallow feature maps;
the input of the first averaging pooling operation is the target image, and the input is a shallow feature map corresponding to the target image; the input of the (i+1) th averaging pooling operation comprises the (i) th target feature map, and the output comprises a shallow feature map corresponding to the (i) th target feature map;
the generating module 303 is specifically configured to generate a kth target feature map corresponding to a target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation, where the K second shallow feature maps are determined based on the K first shallow feature maps.
Optionally, in the case that the K first shallow feature maps are K shallow feature maps output by the kth averaging pooling operation, the input of the (i+1) th averaging pooling operation is the ith target feature map and the ith averaging pooled output, and the output is a shallow feature map corresponding to the ith target feature map and an i shallow feature map corresponding to the ith averaging pooled output; the K second shallow feature maps are the K first shallow feature maps.
Optionally, in the case that the K first shallow feature maps are K shallow feature maps output by the K-time averaging pooling operation, an input of the (i+1) -th averaging pooling operation is the (i) -th target feature map, and the input is a shallow feature map corresponding to the (i) -th target feature map;
the image processing apparatus 300 further includes:
the adjusting module is used for adjusting the sizes of K-1 shallow feature images output by the previous K-1 average pooling operation, and the adjusted sizes of the K-1 shallow feature images are the same as the sizes of the shallow feature images output by the Kth average pooling operation;
and the determining module is used for determining the K second shallow feature images as the K-1 shallow feature images after adjustment and the shallow feature images output by the Kth average pooling operation.
Optionally, the generating module 303 includes:
the first acquisition sub-module is used for acquiring N attention weight values corresponding to the N deep feature maps output by the K-th convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps;
and the second acquisition submodule is used for fusing the N deep feature images and the K second shallow feature images which are output by the Kth convolution pooling operation according to the N attention weight values and the K attention weight values to obtain a Kth target feature image corresponding to the target image.
Optionally, the first obtaining sub-module includes:
a first obtaining unit, configured to obtain N feature vectors corresponding to the N deep feature maps output by the kth convolution pooling operation, and K feature vectors corresponding to the K second shallow feature maps;
the second acquisition unit is used for multiplying each of the N feature vectors by an attention conversion vector to obtain N attention weight values corresponding to N deep feature graphs output by the K-th convolution pooling operation;
and the third acquisition unit is used for multiplying each of the K eigenvectors by the attention conversion vector to obtain K attention weight values corresponding to the K second shallow eigenvectors.
The image processing apparatus 300 can implement each process in the embodiments of the method of the present disclosure and achieve the same beneficial effects, and in order to avoid repetition, a detailed description is omitted here.
Referring to fig. 4, fig. 4 is a block diagram of an electronic device shown according to an exemplary embodiment. As shown in fig. 4, the electronic device 400 includes: a processor 401, a memory 402, a user interface 403, a transceiver 404 and a bus interface.
Wherein, in the embodiment of the present disclosure, the electronic device 400 further includes: a program stored on the memory 402 and executable on the processor 401, which when executed by the processor 401, performs the steps of:
Acquiring a target image;
performing convolution pooling operation on the target image to obtain a deep feature map of the target image;
performing average pooling operation on the target image to obtain a shallow feature map of the target image;
and generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image.
Optionally, the program when executed by the processor 401 implements the steps of:
executing K times of convolution pooling operation to obtain N deep feature graphs output by the K times of convolution pooling operation;
the input of the first convolution pooling operation is the target image, and the input is N deep feature images corresponding to the target image; the input of the (i+1) th convolution pooling operation is the (i) th target feature map corresponding to the target image, and the input is the (N) deep feature maps corresponding to the (i) th target feature map; K. n and i are positive integers;
performing K times of average pooling operation to obtain K first shallow feature images;
the input of the first averaging pooling operation is the target image, and the input is a shallow feature map corresponding to the target image; the input of the (i+1) th averaging pooling operation comprises the (i) th target feature map, and the output comprises a shallow feature map corresponding to the (i) th target feature map;
And generating a K-th target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps which are output by the K-th convolution pooling operation, wherein the K second shallow feature maps are determined based on the K first shallow feature maps.
Optionally, in the case that the K first shallow feature maps are K shallow feature maps output by the kth averaging pooling operation, the input of the (i+1) th averaging pooling operation is the ith target feature map and the ith averaging pooled output, and the output is a shallow feature map corresponding to the ith target feature map and an i shallow feature map corresponding to the ith averaging pooled output; the K second shallow feature maps are the K first shallow feature maps.
Optionally, in the case that the K first shallow feature maps are K shallow feature maps output by the K-time averaging pooling operation, an input of the (i+1) -th averaging pooling operation is the (i) -th target feature map, and the input is a shallow feature map corresponding to the (i) -th target feature map;
the program when executed by the processor 401 implements the steps of:
after the K-time average pooling operation is performed to obtain K first shallow feature maps, before generating a K-th target feature map corresponding to a target image according to the N deep feature maps and the K second shallow feature maps output by the K-th convolution pooling operation, the method further includes:
The size of K-1 shallow feature images output by the previous K-1 average pooling operation is adjusted, and the adjusted size of the K-1 shallow feature images is the same as the size of the shallow feature images output by the Kth average pooling operation;
and determining the K-1 shallow feature graphs after adjustment and the shallow feature graphs output by the Kth average pooling operation as K second shallow feature graphs.
Optionally, the program when executed by the processor 401 implements the steps of:
acquiring N attention weight values corresponding to N deep feature maps output by the Kth convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps;
and according to the N attention weight values and the K attention weight values, fusing the N deep feature images and the K second shallow feature images which are output by the Kth convolution pooling operation to obtain a Kth target feature image corresponding to the target image.
Optionally, the program when executed by the processor 401 implements the steps of:
obtaining N feature vectors corresponding to N deep feature maps output by the Kth convolution pooling operation and K feature vectors corresponding to the K second shallow feature maps;
Multiplying each feature vector in the N feature vectors with an attention conversion vector to obtain N attention weight values corresponding to N deep feature graphs output by the K-th convolution pooling operation;
and multiplying each of the K feature vectors by an attention conversion vector to obtain K attention weight values corresponding to the K second shallow feature maps.
In fig. 4, a bus architecture may comprise any number of interconnected buses and bridges, with one or more processors, represented in particular by processor 401, and various circuits of memory, represented by memory 402, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The transceiver 404 may be a number of elements, i.e. comprising a transmitter and a receiver, providing a means for communicating with various other apparatus over a transmission medium. The user interface 403 may also be an interface capable of interfacing with an inscribed desired device for a different user device, including but not limited to a keypad, display, speaker, microphone, joystick, etc.
The processor 401 is responsible for managing the bus architecture and general processing, and the memory 402 may store data used by the processor 2601 in performing operations.
The electronic device 400 can implement the respective processes in the above-described method embodiments, and in order to avoid repetition, a description thereof is omitted here.
The embodiment of the present disclosure further provides a readable storage medium, on which a program is stored, where the program, when executed by a processor, implements each process of the foregoing image processing method embodiment, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. Wherein the readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
The embodiments of the present disclosure have been described above with reference to the accompanying drawings, but the present disclosure is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the disclosure and the scope of the claims, which are all within the protection of the present disclosure.

Claims (8)

1. An image processing method, the method comprising:
acquiring a target image;
performing convolution pooling operation on the target image to obtain a deep feature map of the target image;
performing average pooling operation on the target image to obtain a shallow feature map of the target image;
generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image;
the generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image includes:
combining the deep feature map of the target image with the shallow feature map of the target image by adopting an attention algorithm to obtain a target feature map corresponding to the target image;
and performing convolution pooling operation on the target image to obtain a deep feature map of the target image, wherein the deep feature map comprises the following components:
executing K times of convolution pooling operation to obtain N deep feature graphs output by the K times of convolution pooling operation;
the input of the first convolution pooling operation is the target image, and the input is N deep feature images corresponding to the target image; the input of the (i+1) th convolution pooling operation is the (i) th target feature map corresponding to the target image, and the input is the (N) deep feature maps corresponding to the (i) th target feature map; K. n and i are positive integers;
And performing an average pooling operation on the target image to obtain a shallow feature map of the target image, wherein the method comprises the following steps:
performing K times of average pooling operation to obtain K first shallow feature images;
the input of the first averaging pooling operation is the target image, and the input is a shallow feature map corresponding to the target image; the input of the (i+1) th averaging pooling operation comprises the (i) th target feature map, and the output comprises a shallow feature map corresponding to the (i) th target feature map;
the generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image includes:
and generating a K-th target feature map corresponding to the target image according to the N deep feature maps and the K second shallow feature maps which are output by the K-th convolution pooling operation, wherein the K second shallow feature maps are determined based on the K first shallow feature maps.
2. The method according to claim 1, wherein, in the case where the K first shallow feature maps are K shallow feature maps output by a kth averaging pooling operation, an input of an (i+1) th averaging pooling operation is the ith target feature map and an ith averaging pooled output, and an output is a shallow feature map corresponding to the ith target feature map and an i shallow feature map corresponding to the ith averaging pooled output; the K second shallow feature maps are the K first shallow feature maps.
3. The method according to claim 1, wherein, in the case that the K first shallow feature maps are K shallow feature maps output by K averaging pooling operations, an input of an i+1th averaging pooling operation is the i-th target feature map, and an output is a shallow feature map corresponding to the i-th target feature map;
after the K-time average pooling operation is performed to obtain K first shallow feature maps, before generating a K-th target feature map corresponding to a target image according to the N deep feature maps and the K second shallow feature maps output by the K-th convolution pooling operation, the method further includes:
the size of K-1 shallow feature images output by the previous K-1 average pooling operation is adjusted, and the adjusted size of the K-1 shallow feature images is the same as the size of the shallow feature images output by the Kth average pooling operation;
and determining the K-1 shallow feature graphs after adjustment and the shallow feature graphs output by the Kth average pooling operation as K second shallow feature graphs.
4. The method of claim 1, wherein generating a kth target feature map corresponding to a target image from the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation comprises:
Acquiring N attention weight values corresponding to N deep feature maps output by the Kth convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps;
and according to the N attention weight values and the K attention weight values, fusing the N deep feature images and the K second shallow feature images which are output by the Kth convolution pooling operation to obtain a Kth target feature image corresponding to the target image.
5. The method of claim 4, wherein the obtaining N attention weight values corresponding to the N deep feature maps output by the kth convolution pooling operation and K attention weight values corresponding to the K second shallow feature maps comprises:
obtaining N feature vectors corresponding to N deep feature maps output by the Kth convolution pooling operation and K feature vectors corresponding to the K second shallow feature maps;
multiplying each feature vector in the N feature vectors with an attention conversion vector to obtain N attention weight values corresponding to N deep feature graphs output by the K-th convolution pooling operation;
and multiplying each of the K feature vectors by an attention conversion vector to obtain K attention weight values corresponding to the K second shallow feature maps.
6. An image processing apparatus, characterized in that the image processing apparatus comprises:
the acquisition module is used for acquiring a target image;
the first operation module is used for performing convolution pooling operation on the target image to obtain a deep feature map of the target image;
the second operation module is used for carrying out average pooling operation on the target image to obtain a shallow feature map of the target image;
the generation module is used for generating a target feature map corresponding to the target image according to the deep feature map of the target image and the shallow feature map of the target image;
the generating module is specifically configured to:
combining the deep feature map of the target image with the shallow feature map of the target image by adopting an attention algorithm to obtain a target feature map corresponding to the target image;
the first operation module is specifically configured to perform K convolution pooling operations to obtain N deep feature graphs output by the kth convolution pooling operations;
the input of the first convolution pooling operation is the target image, and the input is N deep feature images corresponding to the target image; the input of the (i+1) th convolution pooling operation is the (i) th target feature map corresponding to the target image, and the input is the (N) deep feature maps corresponding to the (i) th target feature map; K. n and i are positive integers;
The second operation module is specifically configured to perform K times of average pooling operations to obtain K first shallow feature graphs;
the input of the first averaging pooling operation is the target image, and the input is a shallow feature map corresponding to the target image; the input of the (i+1) th averaging pooling operation comprises the (i) th target feature map, and the output comprises a shallow feature map corresponding to the (i) th target feature map;
the generating module 303 is specifically configured to generate a kth target feature map corresponding to a target image according to the N deep feature maps and the K second shallow feature maps output by the kth convolution pooling operation, where the K second shallow feature maps are determined based on the K first shallow feature maps.
7. An electronic device comprising a processor, a memory and a program stored on the memory and executable on the processor, the program when executed by the processor implementing the steps of the image processing method according to any one of claims 1 to 5.
8. A readable storage medium, characterized in that the readable storage medium has stored thereon a program which, when executed by a processor, implements the steps of the image processing method according to any one of claims 1 to 5.
CN202011632945.7A 2020-12-31 2020-12-31 Image processing method and device and electronic equipment Active CN112819006B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011632945.7A CN112819006B (en) 2020-12-31 2020-12-31 Image processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011632945.7A CN112819006B (en) 2020-12-31 2020-12-31 Image processing method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN112819006A CN112819006A (en) 2021-05-18
CN112819006B true CN112819006B (en) 2023-12-22

Family

ID=75856572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011632945.7A Active CN112819006B (en) 2020-12-31 2020-12-31 Image processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112819006B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657415B (en) * 2021-10-21 2022-01-25 西安交通大学城市学院 Object detection method oriented to schematic diagram

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018233708A1 (en) * 2017-06-23 2018-12-27 华为技术有限公司 Method and device for detecting salient object in image
CN111104962A (en) * 2019-11-05 2020-05-05 北京航空航天大学青岛研究院 Semantic segmentation method and device for image, electronic equipment and readable storage medium
CN111462126A (en) * 2020-04-08 2020-07-28 武汉大学 Semantic image segmentation method and system based on edge enhancement
WO2020233010A1 (en) * 2019-05-23 2020-11-26 平安科技(深圳)有限公司 Image recognition method and apparatus based on segmentable convolutional network, and computer device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018233708A1 (en) * 2017-06-23 2018-12-27 华为技术有限公司 Method and device for detecting salient object in image
WO2020233010A1 (en) * 2019-05-23 2020-11-26 平安科技(深圳)有限公司 Image recognition method and apparatus based on segmentable convolutional network, and computer device
CN111104962A (en) * 2019-11-05 2020-05-05 北京航空航天大学青岛研究院 Semantic segmentation method and device for image, electronic equipment and readable storage medium
CN111462126A (en) * 2020-04-08 2020-07-28 武汉大学 Semantic image segmentation method and system based on edge enhancement

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A-PSPNet:一种融合注意力机制的PSPNet图像语义分割模型;高丹;陈建英;谢盈;;中国电子科学研究院学报(第06期);全文 *
分层特征融合注意力网络图像超分辨率重建;雷鹏程;刘丛;唐坚刚;彭敦陆;;中国图象图形学报(第09期);全文 *

Also Published As

Publication number Publication date
CN112819006A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN108681743B (en) Image object recognition method and device and storage medium
CN111368685B (en) Method and device for identifying key points, readable medium and electronic equipment
CN111144566B (en) Training method for neural network weight parameters, feature classification method and corresponding device
CN113361710B (en) Student model training method, picture processing device and electronic equipment
CN112488923A (en) Image super-resolution reconstruction method and device, storage medium and electronic equipment
CN113837942A (en) Super-resolution image generation method, device, equipment and storage medium based on SRGAN
CN112819006B (en) Image processing method and device and electronic equipment
CN114495916B (en) Method, device, equipment and storage medium for determining insertion time point of background music
CN116205820A (en) Image enhancement method, target identification method, device and medium
CN108170751A (en) For handling the method and apparatus of image
CN112614110B (en) Method and device for evaluating image quality and terminal equipment
CN113066089A (en) Real-time image semantic segmentation network based on attention guide mechanism
WO2024082602A1 (en) End-to-end visual odometry method and apparatus
CN115578614B (en) Training method of image processing model, image processing method and device
CN115393868B (en) Text detection method, device, electronic equipment and storage medium
US11830204B2 (en) Systems and methods for performing motion transfer using a learning model
CN115731451A (en) Model training method and device, electronic equipment and storage medium
CN113362260A (en) Image optimization method and device, storage medium and electronic equipment
CN113012072A (en) Image motion deblurring method based on attention network
CN111985510B (en) Generative model training method, image generation device, medium, and terminal
CN116071625B (en) Training method of deep learning model, target detection method and device
CN111079624B (en) Sample information acquisition method and device, electronic equipment and medium
CN115311152A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN116912838A (en) License plate recognition method, license plate recognition device, license plate recognition terminal, license plate recognition computer program and license plate recognition storage medium
CN117831075A (en) Human skeleton key point reasoning method and device for video stream analysis training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant