WO2022247128A1 - Procédé et appareil de traitement d'image, dispositif électronique et support de stockage - Google Patents

Procédé et appareil de traitement d'image, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2022247128A1
WO2022247128A1 PCT/CN2021/126096 CN2021126096W WO2022247128A1 WO 2022247128 A1 WO2022247128 A1 WO 2022247128A1 CN 2021126096 W CN2021126096 W CN 2021126096W WO 2022247128 A1 WO2022247128 A1 WO 2022247128A1
Authority
WO
WIPO (PCT)
Prior art keywords
image block
feature
target
module
attention
Prior art date
Application number
PCT/CN2021/126096
Other languages
English (en)
Chinese (zh)
Inventor
陈博宇
李楚鸣
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2022247128A1 publication Critical patent/WO2022247128A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis

Definitions

  • the present disclosure relates to the technical field of computers, and relates to an image processing method and device, electronic equipment, and a storage medium.
  • the self-attention network has been widely used in natural language processing.
  • the self-attention network strengthens the features by establishing the connection between features, thereby improving the final performance of the network.
  • the self-attention network has also been applied on a large scale in the field of computer vision, showing great potential.
  • the design of the existing visual self-attention network simply copied the design in natural language processing, and did not improve the characteristics of computer vision, making the performance of the visual self-attention network poor.
  • Embodiments of the present disclosure propose a technical solution of an image processing method and device, electronic equipment, and a storage medium.
  • an image processing method including: determining a first image block feature corresponding to a target image, and dividing the first image block feature into a second image block feature and a third image block feature Block feature; feature enhancement of the second image block feature based on the global attention mechanism to obtain the fourth image block feature, and feature enhancement of the third image block feature based on the local attention mechanism to obtain the fifth image block feature; according to the fourth image block feature and the fifth image block feature, determine the target image block feature corresponding to the target image; according to the target image block feature, perform a target image processing operation on the target image, Get the image processing result.
  • the channel number corresponding to the first image block feature is the target channel number; the dividing the first image block feature into the second image block feature and the third image block feature includes : According to the target channel number and target ratio, divide the first image block feature in the channel dimension to obtain the second image block feature and the third image block feature.
  • the feature enhancement of the second image block feature based on the global attention mechanism to obtain the fourth image block feature includes: determining the first feature according to the second image block feature Vector, the second eigenvector and the third eigenvector; According to the first eigenvector and the second eigenvector, determine the attention feature map; According to the attention feature map and the third eigenvector, determine the Describe the characteristics of the fourth image block.
  • the feature enhancement of the third image block feature based on the local attention mechanism to obtain the fifth image block feature includes: according to the target convolution kernel and the expansion rate of the first channel, for The feature of the third image block is subjected to convolution processing to obtain the feature of the fifth image block.
  • the determining the target image block feature corresponding to the target image according to the fourth image block feature and the fifth image block feature includes: according to the second channel expansion rate, the performing feature conversion on the fourth image block feature and the fifth image block feature to obtain the target image block feature.
  • the image processing method is implemented by a self-attention neural network, wherein the self-attention neural network includes at least one attention module, and at least one of the attention modules includes a global attention A force sub-module, a local attention sub-module and a feed-forward sub-module;
  • the feature enhancement of the second image block feature based on the global attention mechanism to obtain the fourth image block feature includes: using the global attention sub-module module, based on a global attention mechanism to enhance the features of the second image block to obtain the fourth image block features; to perform feature enhancement on the third image block based on the local attention mechanism to obtain the fifth Image block features, including: using the local attention sub-module to perform feature enhancement on the third image block features based on a local attention mechanism to obtain the fifth image block features; according to the fourth image block feature and the fifth image block feature, and determining the target image block feature corresponding to the target image includes: using the feed-forward sub-module to characterize the fourth image block feature and the fifth image block feature Convert to obtain the features of
  • the method further includes: constructing a first network structure search space and a second network structure search space, wherein the first network structure search space includes a plurality of module distribution hyperparameters, so The second network structure search space includes a plurality of module structure hyperparameters; according to the first network structure search space, the target module distribution hyperparameters are determined from the plurality of module distribution hyperparameters, wherein the target module hyperparameters The parameter is used to indicate the number of the global attention sub-module included in each attention module, and the number of the local attention sub-module; according to the target module distribution hyperparameter and the second network structure search Space, determining the target module structure hyperparameters from the plurality of module structure hyperparameters, wherein the target module structure hyperparameters are used to indicate the number of target channels corresponding to each attention module, the target corresponding to each local attention sub-module The convolution kernel and the first channel expansion rate, and the second channel expansion rate corresponding to each feedforward module; according to the target module distribution hyperpara
  • the determining the target module distribution hyperparameters from the plurality of module distribution hyperparameters according to the first network structure search space includes: according to the first network structure search space, Constructing a first super network, wherein the first super network includes a plurality of first optional network structures constructed according to the plurality of module distribution hyperparameters; by performing network training on the first super network, from the Determine a first target network structure among the plurality of first optional network structures; and determine a module distribution hyperparameter corresponding to the first target network structure as the target module distribution hyperparameter.
  • the determining the target module structure hyperparameters from the plurality of module structure hyperparameters according to the target module distribution hyperparameters and the second network structure search space includes: according to the The target module distribution hyperparameters and the second network structure search space are used to construct a second super network, wherein the second super network includes a plurality of second optional networks constructed according to the plurality of module structure hyperparameters structure; by performing network training on the second super network, determine a second target network structure from the plurality of second optional network structures; determine the module structure hyperparameter corresponding to the second target network structure as the selected Describe the target module structure hyperparameters.
  • each of the local attention sub-modules corresponds to the same target convolution kernel and the first channel expansion rate.
  • an image processing device including: a feature determining part configured to determine a feature of a first image block corresponding to a target image, and divide the feature of the first image block into a second image block feature and the third image block feature; the attention part is configured to enhance the feature of the second image block feature based on the global attention mechanism to obtain the fourth image block feature, and the third image block feature based on the local attention mechanism The feature of the image block is enhanced to obtain the fifth image block feature; the first determination part is configured to determine the target image block feature corresponding to the target image according to the fourth image block feature and the fifth image block feature The target image processing part is configured to perform a target image processing operation on the target image according to the characteristics of the target image block to obtain an image processing result.
  • an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to call the instructions stored in the memory to execute the above-mentioned method.
  • a computer-readable storage medium on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above method is implemented.
  • a computer program including computer readable codes.
  • the computer readable codes When the computer readable codes are run in an electronic device, a processor in the electronic device executes to implement the above method.
  • the first image block feature corresponding to the target image is determined, and the first image block feature is divided into the second image block feature and the third image block feature; based on the global attention mechanism, the second image block feature Perform feature enhancement to obtain the fourth image block feature, and perform feature enhancement on the third image block feature based on the local attention mechanism to obtain the fifth image block feature; determine the target image according to the fourth image block feature and the fifth image block feature The corresponding target image block feature; according to the target image block feature, perform target image processing operation on the target image to obtain an image processing result.
  • the feature enhancement of the image block features is carried out, so that the target image block features after feature enhancement include both global information and local information, which effectively improves the semantic expression ability of the target image block features. Furthermore, after the target image processing operation is performed using the target image block features with high semantic expression ability, the accuracy of the image processing result can be improved.
  • FIG. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure
  • Fig. 2 shows a schematic diagram of determining features of multiple first image blocks corresponding to a target image according to an embodiment of the present disclosure
  • Fig. 3 shows a network structure diagram of a kind of self-attention neural network in the related art
  • FIG. 4 shows a network structure diagram of a self-attention neural network according to an embodiment of the present disclosure
  • FIG. 5 shows a schematic diagram of a hierarchical network structure search according to an embodiment of the present disclosure
  • Fig. 6 shows a block diagram of an image processing device according to an embodiment of the present disclosure
  • Fig. 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure
  • Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • Fig. 1 shows a flowchart of an image processing method according to an embodiment of the present disclosure.
  • the image processing method can be executed by electronic devices such as terminal equipment or servers, and the terminal equipment can be user equipment (User Equipment, UE), mobile equipment, user terminal, cellular phone, cordless phone, personal digital assistant (Personal Digital Assistant, PDA) , a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc., the method may be implemented by calling a computer-readable instruction stored in a memory by a processor. Alternatively, the method may be performed by a server. As shown in Figure 1, the image processing method may include:
  • step S11 a first image block feature corresponding to the target image is determined, and the first image block feature is divided into a second image block feature and a third image block feature.
  • the target image here is an image to be processed that requires image processing.
  • the target image can be segmented, and the target image can be divided into multiple image blocks.
  • By extracting the features of each image block, multiple first image blocks corresponding to the target image can be obtained. feature.
  • the number of multiple image blocks may be determined according to actual conditions, which is not limited in the present disclosure.
  • Fig. 2 shows a schematic diagram of determining features of a first image block corresponding to a target image according to an embodiment of the present disclosure.
  • the target image is segmented, the target image is divided into L image blocks, feature extraction is performed on the L image blocks respectively, and L first image block features are obtained.
  • the multiple first image block features may be converted into a first image block feature sequence.
  • the manner of converting the multiple first image block features into the first image block feature sequence may be determined according to actual conditions, which is not limited in the present disclosure.
  • step S12 the features of the second image block are enhanced based on the global attention mechanism to obtain the features of the fourth image block, and the features of the third image block are enhanced based on the local attention mechanism to obtain the features of the fifth image block.
  • the global attention mechanism can refer to the process of determining the global attention relationship between all features, and then determining new features according to the global attention relationship. Therefore, based on the global attention mechanism, the global attention relationship between the features of the second image block is determined, and then the feature of the fourth image block is determined based on the global attention relationship, so as to realize feature enhancement. Wherein, since the fourth image block feature is determined based on the global attention relationship, the fourth image block feature includes global information of the target image.
  • the process of feature enhancement based on global attention will be described in detail later based on possible implementations of the present disclosure.
  • the local attention mechanism can refer to the process of determining the local attention relationship between local adjacent features, and then determining new features according to the local attention relationship. Therefore, based on the local attention mechanism, the local attention relationship between the features of the local adjacent third image block is determined, and then the feature of the fifth image block is determined based on the local attention relationship, so as to realize feature enhancement. Wherein, since the fifth image block feature is determined based on the local attention relationship, the fifth image block feature includes local information of the target image.
  • the process of feature enhancement based on local attention will be described in detail below based on possible implementations of the present disclosure.
  • step S13 according to the fourth image block feature and the fifth image block feature, the target image block feature corresponding to the target image is determined.
  • the target image block feature determined according to the fourth image block feature and the fifth image block feature includes both the target image
  • the global information of the target image, including the local information of the target image has a high semantic expression ability.
  • step S14 the target image processing operation is performed on the target image according to the feature of the target image block to obtain an image processing result.
  • the target image block features include both global information and local information, it has high semantic expression ability. Therefore, according to the image processing requirements, using the target image block features to perform target image processing operations can obtain image processing results with high precision.
  • a plurality of first image block features corresponding to the target image are determined, and the first image block features are divided into second image block features and third image block features; Block features are enhanced to obtain the fourth image block features, and the third image block features are enhanced based on the local attention mechanism to obtain the fifth image block features; according to the fourth image block features and the fifth image block features, determine A target image block feature corresponding to the target image; according to the target image block feature, perform a target image processing operation on the target image to obtain an image processing result.
  • the feature enhancement of the image block features is carried out, so that the target image block features after feature enhancement include both global information and local information, which effectively improves the semantic expression ability of the target image block features. Furthermore, after the target image processing operation is performed using the target image block features with high semantic expression ability, the accuracy of the image processing result can be improved.
  • the channel number corresponding to the first image block feature is the target channel number; dividing the first image block feature into the second image block feature and the third image block feature includes: according to the target channel number and The target ratio is to divide the features of the first image block in the channel dimension to obtain the features of the second image block and the features of the third image block.
  • the second image block feature for global attention processing and the third image block feature for local attention processing can be obtained.
  • the second image block feature The number of features of the third image block and the third image block meet the target ratio, making preparations for the subsequent feature enhancement process. Values of the target number of channels and the target ratio may be determined according to actual conditions, which is not limited in the present disclosure.
  • each second image block feature group includes L second image block features, and the channel number corresponding to each second image block feature is d k /N; each third image block feature group includes L third image blocks feature, the number of channels corresponding to each third image block feature is d k /N.
  • the value of N can be determined according to the actual situation, which is not limited in the present disclosure.
  • the second image block feature is enhanced based on the global attention mechanism to obtain the fourth image block feature, including: determining the first feature vector and the second feature vector according to the second image block feature and a third feature vector; according to the first feature vector and the second feature vector, determine an attention feature map; according to the attention feature map and the third feature vector, determine a plurality of second image block features.
  • the global attention relationship between the features of each second image block can be obtained, and then according to the attention feature map, the feature enhancement based on the global attention mechanism can be realized, and the fourth image including global information can be effectively obtained block features.
  • a second image block feature group is an example, for any second image block feature group, the L second image block features included in the second image block feature group are converted into three different feature vectors: the first The feature vector Q, the second feature vector K and the third feature vector V, wherein the number of feature channels corresponding to the first feature vector Q, the second feature vector K and the third feature vector V is d k /N, and then using the following The above formula (1), feature enhancement based on the global attention mechanism:
  • Softmax( ) represents the normalization function
  • represents the vector dot product.
  • the dot product operation can be obtained by using the eigenvector Q and the eigenvector K to obtain the dot product result Q ⁇ (K T ), using the number of feature channels d k /N and the normalization function Softmax( ⁇ ), the dot product result Q ⁇ (K T ) is normalized, and the attention feature map used to indicate the global attention relationship can be obtained Using Attention Feature Maps Att(Q, K, V) is obtained by performing a dot product operation with the feature vector V, and Att(Q, K, V) is an output result corresponding to the feature group of the second image block obtained after feature enhancement based on the global attention mechanism.
  • the The output result corresponding to the second image block feature group, and then the The output results corresponding to the second image block feature groups are fused in the channel dimension to obtain L fourth image block features after feature enhancement based on the global attention mechanism.
  • the third image block feature is enhanced based on the local attention mechanism to obtain the fifth image block feature, including: performing convolution processing on the third image block feature to obtain the fifth image block feature.
  • a plurality of features of the fifth image block including local information can be effectively obtained according to the convolution result. Since the feature of the third image block is one-dimensional, one-dimensional convolution processing may be performed on the feature of the third image block to obtain the feature of the fifth image block.
  • a third image block feature group is taken as an example, for any third image block feature group, one-dimensional convolution processing is performed on the L third image block features included in the third image block feature group to obtain the third image The convolution result corresponding to the block feature group.
  • the features of the third image block are enhanced based on the local attention mechanism to obtain the features of the fifth image block, including: according to the target convolution kernel and the expansion rate of the first channel, the third image block The features are subjected to convolution processing to obtain the features of the fifth image block.
  • the fifth image block feature can be obtained by performing one-dimensional convolution processing on the third image block feature according to the target convolution kernel and the expansion rate of the first channel.
  • a third image block feature group is an example, for any third image block feature group, according to the target convolution kernel and the first channel expansion rate, the L third image block features included in the third image block feature group Carrying out convolution processing, including: performing one-dimensional pointwise convolution (pointwise convolution) processing on L third image block features according to the expansion rate E of the first channel to obtain the first convolution sub-result, wherein the first convolution The number of channels corresponding to the product result is d k /N ⁇ E; according to the target convolution kernel K, perform one-dimensional convolution (depthwise convolution) processing on the first convolution result to obtain the second convolution sub-result, where , the number of channels corresponding to the second convolution sub-result is d k /N ⁇ E; according to the expansion rate E of the first channel, one-dimensional pointwise convolution is performed on the second convolution sub-result to obtain the third image block feature group The corresponding convolution result, wherein the number of channels of
  • the The convolution result corresponding to the third image block feature group, and then the The convolution results corresponding to the third image block feature groups are fused in the channel dimension to obtain L fifth image block features after feature enhancement based on the local attention mechanism.
  • the feature enhancement of the third image block based on the local attention mechanism may be implemented by reducing the size of the processed data in addition to the above-mentioned one-dimensional convolution processing method.
  • a third image block feature group is taken as an example, for any third image block feature group, only part of the third image block features in the L third image block features included in the third image block feature group (for example, L/2 third image block features), perform the feature enhancement shown in the above formula (1), so as to realize the feature enhancement based on the local attention mechanism by reducing the size of the processed data.
  • the feature enhancement of the third image block based on the local attention mechanism may also adopt other processing forms according to actual needs, which is not limited in the present disclosure.
  • determining the target image block feature corresponding to the target image includes: according to the second channel expansion rate, the fourth image block feature and the fifth image block feature The feature of the image block is transformed to obtain the feature of the target image block.
  • the L fourth image block features and the L fifth image block features are fused in the channel dimension to obtain the fusion result, and then use the second channel Expansion rate d z , perform feature conversion on the fusion result, so that the number of channels of the fusion result is converted to be the same as the target channel number corresponding to the first image block feature before feature enhancement, so as to obtain L channels with the number of d k Target image block features.
  • the value of the expansion rate dz of the second channel may be determined according to actual conditions, which is not limited in the present disclosure.
  • the target image processing operation includes one of the following: target detection, target tracking, image recognition, and image classification.
  • the target image processing operation is performed by using the target image block features with higher semantic expression ability, so that the image processing results with higher precision can be obtained.
  • target image processing operations may also include other image processing operations according to actual image processing requirements, which are not limited in the present disclosure.
  • the image processing method is implemented through a self-attention neural network, wherein the self-attention neural network includes at least one attention module, and at least one attention module includes a global attention sub-module, a local The attention sub-module and the feed-forward sub-module; based on the global attention mechanism, the features of the second image block are enhanced to obtain the characteristics of the fourth image block, including: using the global attention sub-module to process the second image based on the global attention mechanism
  • the block feature is enhanced to obtain the fourth image block feature; the third image block feature is enhanced based on the local attention mechanism to obtain the fifth image block feature, including: using the local attention sub-module, based on the local attention mechanism.
  • the feature of the third image block feature is enhanced to obtain the feature of the fifth image block; according to the feature of the fourth image block and the feature of the fifth image block, the target image block feature corresponding to the target image is determined, including: using the feedforward sub-module, the fourth image block feature The image block feature and the fifth image block feature are subjected to feature conversion to obtain the target image block feature.
  • the attention module includes a global attention sub-module, a local attention sub-module and a feed-forward sub-module, so that the global attention sub-module can be used for the global attention-based mechanism
  • the feature enhancement of the local attention sub-module is used to enhance the feature based on the local attention mechanism
  • the feedforward module is used to further perform feature conversion on the image block features after the feature enhancement based on the global attention mechanism and the local attention mechanism.
  • the target image block features including both global information and local information are obtained, which effectively improves the semantic expression ability of the target image block features, and then effectively improves the network performance of the self-attention neural network.
  • Fig. 3 shows a network structure diagram of a self-attention neural network in the related art.
  • the self-attention neural network includes M attention modules (G-Block_1 to G-Block_M), and each attention module adopts a multi-head attention (Multi-Head Attention, MHA) mechanism.
  • MHA Multi-Head Attention
  • the L image block features of are sent to each global attention sub-module, so as to perform the feature enhancement based on the global attention mechanism shown in the above formula (1) in each global attention sub-module, and obtain each global attention sub-module
  • the output result of the attention sub-module, and then the output result of each global attention sub-module is fused in the channel dimension, and then input to the FFN sub-module for further feature conversion to obtain the output result of G-Block_1, which is the channel
  • the number of L image block features is d k ; then, the number of
  • the self-attention neural network in the related art shown in Figure 3 only includes the global attention sub-module and the FFN sub-module in each attention module, therefore, the self-attention neural network in the related art can only be realized based on
  • the feature enhancement of the global attention mechanism leads to ignoring the local information in the target image, making the network performance of the self-attention neural network poor.
  • Fig. 4 shows a network structure diagram of a self-attention neural network according to an embodiment of the present disclosure.
  • the self-attention neural network proposed in this disclosure includes M attention modules (GL-Block_1 to GL-Block_M).
  • At least one attention module includes G global attention sub-modules, L local attention sub-modules and an FFN sub-module.
  • L the number of target channels corresponding to G-Block_1 is d k
  • L 2
  • a part of the L image block features with the number of channels d k /3 is sent to the global attention sub-module, and the remaining two parts of the L image block features with the number of channels d k /3 are sent to two local attention sub-modules respectively .
  • the output of GL-Block_M is the image block feature after M times of global-local feature enhancement, which is the target image block feature corresponding to the above target image.
  • the self-attention neural network also includes an image processing module (Head module shown in the figure), input the target image block features output by GL-Block_M into the Head module to perform corresponding target image processing operations to obtain image processing results.
  • the Head module can be set according to actual image processing requirements, for example, it can be an image classification module, a target detection module, an image recognition module, an image tracking module, etc., which is not limited in the present disclosure.
  • both global information and local information can be used in the process of feature enhancement, so that the final target image block features have higher semantic expression ability , and then after using the target image features to perform the target image processing operation on the target image, the image processing result with higher precision can be obtained, thereby effectively improving the network performance of the self-attention neural network.
  • the local attention sub-module may include two pointwise convolutional layers, and a depthwise convolutional layer between the two pointwise convolutional layers.
  • the first pointwise convolution layer corresponds to the first channel expansion rate E to achieve E-fold expansion of the number of channels.
  • the depthwise convolution layer corresponds to the target convolution kernel K, which will not change the number of channels.
  • the last pointwise volume The product layer restores the number of channels to the number of input channels.
  • the feature enhancement process of the local attention sub-module based on the local attention mechanism is similar to the related description above.
  • the FFN sub-module corresponds to the expansion rate d z of the second channel, and is used to perform one-step feature conversion on the output results of each global attention sub-module and each local attention sub-module.
  • the feature conversion process of the FFN sub-module is similar to the related description above.
  • Self-attention neural network as shown in Figure 4, when constructing this self-attention neural network, the corresponding module distribution parameters of each attention module (the number G of the global attention sub-module included in each attention module, and The number of local attention sub-modules L), the module structure hyperparameters corresponding to each attention module (the number of target channels d k corresponding to each attention module, the target convolution kernel K corresponding to each local attention sub-module and the first
  • the channel expansion rate E and the second channel expansion rate d z corresponding to each FFN sub-module are network hyperparameters that need to be considered.
  • the image processing method further includes: constructing a first network structure search space and a second network structure search space, wherein the first network structure search space includes a plurality of module distribution hyperparameters, and the second The network structure search space includes a plurality of module structure hyperparameters; the first network structure search space determines the target module distribution hyperparameters from a plurality of module distribution hyperparameters, wherein the target module hyperparameters are used to indicate each attention module The number of global attention sub-modules included in , and the number of local attention sub-modules; according to the target module distribution hyperparameters and the second network structure search space, determine the target module structure hyperparameters from multiple module structure hyperparameters , where the target module structure hyperparameters are used to indicate the number of target channels corresponding to each attention module, the target convolution kernel and first channel expansion rate corresponding to each local attention sub-module, and the second channel corresponding to each feedforward module Expansion rate; according to the target module distribution hyperparameters and the
  • the target module distribution hyperparameters and the target structure hyperparameters can be hierarchically Network structure search, so that the size of the search space corresponding to each level of network structure search can be reduced, the search efficiency can be effectively improved, and the self-attention neural network can be quickly constructed.
  • Table 1 shows examples of the first network structure search space and the second network structure search space.
  • the search space of the first network structure includes the distribution of possible global attention sub-modules and local attention sub-modules in each attention module.
  • the second network structure search space includes module structure parameters corresponding to each attention module. For example, the number of target channels d k corresponding to each attention module (optional items include 96, 192, 384), the target convolution kernel K corresponding to each local attention sub-module (optional items include 17, 31, 45) and the first The channel expansion rate E (optional items include 1, 2, 4), and the second channel expansion rate dz corresponding to each FFN sub-module (optional items include 2, 4, 6).
  • the optional items of the module distribution hyperparameters (G, L) and the module structure hyperparameters (K, E, d k , d z ) can include the optional items shown in Table 1 above, and other optional items can be determined according to the actual situation, The number of optional items corresponding to each hyperparameter may also be determined according to actual conditions, which is not limited in the present disclosure.
  • Fig. 5 shows a schematic diagram of a hierarchical network structure search according to an embodiment of the present disclosure.
  • a high-level network structure search is performed in the first network structure search space, and the target module distribution hyperparameters (G, L) are determined for each self-attention module from multiple module distribution hyperparameters .
  • the target module distribution hyperparameters (G, L) in each self-attention module are fixed, and a low-level network structure search is performed in the second network structure search space, and the hyperparameters of multiple module structures are
  • Each self-attention module determines the target module structural hyperparameters (K, E, d k , d z ).
  • determining the target module distribution hyperparameters from multiple module distribution hyperparameters according to the first network structure search space includes: constructing the first super network according to the first network structure search space, wherein, The first super network includes multiple first optional network structures constructed according to multiple module distribution hyperparameters; by performing network training on the first super network, the first target network structure is determined from the multiple first optional network structures ; Determine the module distribution hyperparameter corresponding to the first target network structure as the target module distribution hyperparameter.
  • multiple first optional network structures can be constructed, and then a first super network including multiple first optional network structures can be constructed.
  • a super network is trained to realize high-level network structure search in the first network structure search space, and the target first network structure with the best performance is determined from multiple first optional network structures, and the target first network structure is A module distribution hyperparameter corresponding to a network structure is determined as a target module distribution hyperparameter.
  • train the first super network to obtain multiple first optional network architectures after training; verify the accuracy of the multiple first optional network architectures after training based on the evolutionary algorithm, and select the first preliminary network architecture from them.
  • the size of the first preset number may be determined according to actual conditions, which is not limited in the present disclosure.
  • the target module structure hyperparameters are determined from multiple module structure hyperparameters according to the target module distribution hyperparameters and the second network structure search space, including: according to the target module distribution hyperparameters and the second network structure
  • the structure search space is to construct a second super network, wherein the second super network includes a plurality of second optional network structures constructed according to a plurality of module structure hyperparameters; by performing network training on the second super network, from a plurality of first Determining the second target network structure in the two optional network structures; determining the module structure hyperparameters corresponding to the second target network structure as the target module structure hyperparameters.
  • multiple second optional network structures can be constructed, and then multiple network structures including multiple A second super network with a second optional network structure, by training the second super network, to realize low-level network structure search in the second network structure search space, from a plurality of second optional network structures
  • the target second network structure with the best performance is determined, and the module structure hyperparameters corresponding to the target second network structure are determined as the target module structure hyperparameters.
  • each local attention sub-module corresponds to the same target convolution kernel and first channel expansion rate.
  • the construction principle of the self-attention neural network may include: for the same attention module, each local attention sub-module in the attention module corresponds to the same target convolution kernel K and first channel expansion rate E.
  • each local attention sub-module in the attention module corresponds to the same target convolution kernel K and first channel expansion rate E.
  • the second optional network structure included in the second super network that does not meet the principle can be deleted to reduce the size of the second network search space and improve the follow-up to the target second network structure. search efficiency.
  • the second super network After narrowing the search space of the second network structure based on the construction principle of the self-attention neural network, all the second optional network structures that conform to the construction principle of the self-attention neural network are included in the second super network.
  • the second super network is trained to realize the low-level network structure search in the second network structure search space, and determine the hyperparameters of the target module structure.
  • train the second super network to obtain multiple second optional network architectures after training; verify the accuracy of the multiple second optional network architectures after training based on the evolutionary algorithm, and select the second preliminary network architecture from them.
  • the size of the first preset number may be determined according to actual conditions, which is not limited in the present disclosure.
  • the self-attention neural network can be constructed.
  • the self-attention neural network includes at least one self-attention module, and at least one self-attention module includes a global attention sub-module and a local attention sub-module, so that the self-attention neural network can perform feature enhancement.
  • global information local information can also be used to effectively improve the network performance of the self-attention neural network.
  • the self-attention neural network of the present disclosure can be applied to image processing tasks such as target detection, target tracking, image recognition, image classification, etc., which is not limited in the present disclosure.
  • the present disclosure also provides image processing devices, electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image processing method provided in the present disclosure.
  • image processing devices electronic equipment, computer-readable storage media, and programs, all of which can be used to implement any image processing method provided in the present disclosure.
  • Fig. 6 shows a block diagram of an image processing device according to an embodiment of the present disclosure. As shown in Figure 6, the device 60 includes:
  • the feature determining part 61 is configured to determine the first image block feature corresponding to the target image, and divide the first image block feature into a second image block feature and a third image block feature;
  • the attention part 62 is configured to perform feature enhancement on the features of the second image block based on the global attention mechanism to obtain the features of the fourth image block, and to perform feature enhancement on the features of the third image block based on the local attention mechanism to obtain the fifth image block features;
  • the first determining part 63 is configured to determine a target image block feature corresponding to the target image according to the fourth image block feature and the fifth image block feature;
  • the target image processing part 64 is configured to perform a target image processing operation on the target image according to the feature of the target image block to obtain an image processing result.
  • the channel number corresponding to the feature of the first image block is the target channel number
  • the feature determination part 61 is further configured to divide the features of the first image block in the channel dimension according to the number of target channels and the target ratio to obtain the features of the second image block and the features of the third image block.
  • the attention part 62 is further configured to determine the first feature vector, the second feature vector and the third feature vector according to the second image block features;
  • the second feature vector is to determine the attention feature map; according to the attention feature map and the third feature vector, the fourth image block feature is determined.
  • the attention part 62 is further configured to perform convolution processing on the features of the third image block according to the target convolution kernel and the expansion rate of the first channel to obtain the features of the fifth image block.
  • the first determining part 63 is further configured to perform feature conversion on the fourth image block feature and the fifth image block feature according to the second channel expansion ratio to obtain the target image block feature.
  • the device 60 implements the image processing method through a self-attention neural network, wherein the self-attention neural network includes at least one attention module, and at least one attention module includes a global attention sub-module, Local attention sub-module and feed-forward sub-module;
  • the attention part 62 is also configured to use the global attention sub-module to enhance the features of the second image block based on the global attention mechanism to obtain the characteristics of the fourth image block;
  • the attention sub-module based on the local attention mechanism, performs feature enhancement on the features of the third image block to obtain the features of the fifth image block; using the feed-forward sub-module, the features of the fourth image block and the The fifth image block feature is subjected to feature conversion to obtain the target image block feature.
  • the device 60 also includes:
  • the search space construction part is configured to construct a first network structure search space and a second network structure search space, wherein the first network structure search space includes a plurality of module distribution hyperparameters, and the second network structure search space includes a plurality of module structure hyperparameters;
  • the second determining part is configured to determine the target module distribution hyperparameters from a plurality of module distribution hyperparameters according to the first network structure search space, wherein the target module hyperparameters are used to indicate the global attention included in each attention module The number of sub-modules, and the number of local attention sub-modules;
  • the third determining part is configured to determine the target module structure hyperparameters from a plurality of module structure hyperparameters according to the target module distribution hyperparameters and the second network structure search space, wherein the target module structure hyperparameters are used to indicate each attention
  • the network construction part is configured to construct a self-attention neural network according to target module distribution hyperparameters and target module structure hyperparameters.
  • the second determining part is further configured to construct a first super network according to the first network structure search space, wherein the first super network includes multiple modules constructed according to multiple module distribution hyperparameters.
  • a first optional network structure by performing network training on the first super network, determine the first target network structure from multiple first optional network structures; determine the module distribution hyperparameter corresponding to the first target network structure as the target Module distribution hyperparameters.
  • the third determining part is configured to construct a second super network according to the target module distribution hyperparameters and the search space of the second network structure, wherein the second super network includes A plurality of second optional network structures constructed by hyperparameters; by performing network training on the second super network, determining a second target network structure from a plurality of second optional network structures; the module structure corresponding to the second target network structure
  • the hyperparameters are determined as target module structural hyperparameters.
  • each local attention sub-module corresponds to the same target convolution kernel and first channel expansion rate.
  • the functions or parts included in the device provided by the embodiments of the present disclosure may be configured to execute the methods described in the above method embodiments, and the implementation manner may refer to the descriptions of the above method embodiments.
  • Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and the above-mentioned method is implemented when the computer program instructions are executed by a processor.
  • Computer readable storage media may be volatile or nonvolatile computer readable storage media.
  • An embodiment of the present disclosure also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
  • An embodiment of the present disclosure also provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.
  • Electronic devices may be provided as terminals, servers, or other forms of devices.
  • Fig. 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • the electronic device 800 may be a terminal such as a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, or a personal digital assistant.
  • electronic device 800 may include one or more of the following components: processing component 802, memory 804, power supply component 806, multimedia component 808, audio component 810, input/output (Input/Output, I/O) interface 812 , sensor component 814 , and communication component 816 .
  • the processing component 802 generally controls the overall operations of the electronic device 800, such as those associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 802 may include one or more components that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia portion to facilitate interaction between multimedia component 808 and processing component 802 .
  • the memory 804 is configured to store various types of data to support operations at the electronic device 800 . Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like.
  • the memory 804 can be realized by any type of volatile or non-volatile storage device or their combination, such as Static Random-Access Memory (Static Random-Access Memory, SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable read only memory, EEPROM), erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM), programmable read-only memory (Programmable Read-Only Memory, PROM), read-only memory (Read-Only Memory, ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • Static Random-Access Memory SRAM
  • Electrically Erasable Programmable Read-Only Memory Electrically Erasable Programmable Read-Only Memory
  • EEPROM Electrically Era
  • the power supply component 806 provides power to various components of the electronic device 800 .
  • Power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for electronic device 800 .
  • the multimedia component 808 includes a screen providing an output interface between the electronic device 800 and the user.
  • the screen may include a Liquid Crystal Display (LCD) and a Touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense a boundary of a touch or swipe action, but also detect duration and pressure associated with the touch or swipe action.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capability.
  • the audio component 810 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (microphone, MIC), and when the electronic device 800 is in an operation mode, such as a calling mode, a recording mode and a voice recognition mode, the microphone is configured to receive an external audio signal. Received audio signals may be further stored in memory 804 or sent via communication component 816 .
  • the audio component 810 also includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: a home button, volume buttons, start button, and lock button.
  • Sensor assembly 814 includes one or more sensors for providing status assessments of various aspects of electronic device 800 .
  • the sensor component 814 can detect the open/closed state of the electronic device 800, the relative positioning of components, such as the display and the keypad of the electronic device 800, the sensor component 814 can also detect the electronic device 800 or a Changes in position of components, presence or absence of user contact with electronic device 800 , electronic device 800 orientation or acceleration/deceleration and temperature changes in electronic device 800 .
  • Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • the sensor component 814 may also include an optical sensor, such as a CMOS (Complementary Metal Oxide Semiconductor, Complementary Metal Oxide Semiconductor) or a CCD (Charge-coupled Device, Charge-Coupled Device) image sensor, for use in imaging applications.
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.
  • the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
  • the electronic device 800 can access wireless networks based on communication standards, such as Wi-Fi, 2G (2-Generation wireless telephone technology, second-generation mobile communication technology) or 3G (3-Generation wireless telephone technology, third-generation mobile communication technology ), or a combination of them.
  • the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a Near Field Communication (NFC) portion to facilitate short-range communication.
  • NFC Near Field Communication
  • the NFC part can be based on Radio Frequency Identification (RFID) technology, Infrared Data Association (Infrared Data Association, IrDA) technology, Ultra Wide Band (Ultra Wide Band, UWB) technology, Bluetooth (Bluetooth, BT) technology and other technology to achieve.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wide Band
  • Bluetooth Bluetooth, BT
  • the electronic device 800 may be implemented by one or more application-specific integrated circuits (Application Specific Integrated Circuit, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (Digital signal processing device , DSPD), programmable logic device (programmable logic device, PLD), field programmable gate array (Field Programmable Gate Array, FPGA), controller, microcontroller, microprocessor or other electronic components to implement, used to perform the above method.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processing
  • DSPD digital signal processing devices
  • programmable logic device programmable logic device
  • FPGA field programmable gate array
  • controller microcontroller, microprocessor or other electronic components to implement, used to perform the above method.
  • a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to implement the above method.
  • Fig. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • the electronic device 1900 may be provided as a server.
  • electronic device 1900 includes processing component 1922 , which further includes one or more processors, and a memory resource represented by memory 1932 for storing instructions executable by processing component 1922 , such as application programs.
  • the application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above method.
  • Electronic device 1900 may also include a power supply component 1926 configured to perform power management of electronic device 1900, a wired or wireless network interface 1950 configured to connect electronic device 1900 to a network, and an input-output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on the operating system stored in the memory 1932, such as the Microsoft server operating system (Windows Server TM ), the graphical user interface-based operating system (Mac OS X TM ) introduced by Apple Inc., and the multi-user and multi-process computer operating system (Unix TM ), a free and open source Unix-like operating system (Linux TM ), an open source Unix-like operating system (FreeBSD TM ), or the like.
  • Microsoft server operating system Windows Server TM
  • Mac OS X TM graphical user interface-based operating system
  • Unix TM multi-user and multi-process computer operating system
  • Linux TM free and open source Unix-like operating system
  • FreeBSD TM open source Unix-like operating system
  • a non-transitory computer-readable storage medium such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to implement the above method.
  • the present disclosure can be a system, method and/or computer program product.
  • a computer program product may include a computer readable storage medium having computer readable program instructions thereon for causing a processor to implement various aspects of the present disclosure.
  • a computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device.
  • a computer readable storage medium may be, for example but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Examples of computer-readable storage media include: portable computer disks, hard disks, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), erasable programmable Read-only memory (Electrical Programmable Read Only Memory, EPROM or flash memory), static random access memory (Static Random Access Memory, SRAM), portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), digital multi Functional discs (Digital Video Disc, DVD), memory sticks, floppy discs, mechanical encoding devices, such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination of the above.
  • computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • Computer program instructions for performing operations of embodiments of the present disclosure may be assembly instructions, instruction set architecture (Industry Standard Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in a or any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as “C” or similar programming languages language.
  • Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it may be connected to an external computer such as use an Internet service provider to connect via the Internet).
  • electronic circuits such as programmable logic circuits, field programmable gate arrays (FPGAs) or programmable logic arrays (programmable logic arrays, PLAs), are personalized by utilizing state information of computer-readable program instructions, The electronic circuit can execute computer readable program instructions, thereby implementing various aspects of the embodiments of the present disclosure.
  • These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.
  • each block in a flowchart or block diagram may represent a portion, a program segment, or a portion of an instruction that includes one or more Executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or action , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the computer program product can be realized by hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK) and the like.
  • Embodiments of the present disclosure relate to an image processing method and device, electronic equipment, and a storage medium.
  • the method includes: determining a first image block feature corresponding to a target image, and dividing the first image block feature into a second image block Features and third image block features; feature enhancement of the second image block features based on the global attention mechanism to obtain the fourth image block features, and feature enhancement of the third image block features based on the local attention mechanism, Obtaining the fifth image block feature; determining the target image block feature corresponding to the target image according to the fourth image block feature and the fifth image block feature; performing the target image on the target image according to the target image block feature Target image processing operation to get the image processing result.
  • Embodiments of the present disclosure can improve the accuracy of image processing results.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé et un appareil de traitement d'image, un dispositif électronique et un support de stockage, le procédé consistant : à déterminer des premières caractéristiques de bloc d'image correspondant à une image cible et à diviser les premières caractéristiques de bloc d'image en des deuxièmes caractéristiques de bloc d'image et des troisièmes caractéristiques de bloc d'image (S11) ; sur la base d'un mécanisme d'attention globale, à réaliser une amélioration de caractéristique sur les deuxièmes caractéristiques de bloc d'image pour obtenir des quatrièmes caractéristiques de bloc d'image et, sur la base d'un mécanisme d'attention locale, à réaliser une amélioration de caractéristique sur les troisièmes caractéristiques de bloc d'image pour obtenir des cinquièmes caractéristiques de bloc d'image (S12) ; selon les quatrièmes caractéristiques de bloc d'image et les cinquièmes caractéristiques de bloc d'image, à déterminer des caractéristiques de bloc d'image cible correspondant à l'image cible (S13) ; en fonction des caractéristiques de bloc d'image cible, à réaliser une opération de traitement d'image cible sur l'image cible pour obtenir un résultat de traitement d'image (S14).
PCT/CN2021/126096 2021-05-25 2021-10-25 Procédé et appareil de traitement d'image, dispositif électronique et support de stockage WO2022247128A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110573055.1 2021-05-25
CN202110573055.1A CN113298091A (zh) 2021-05-25 2021-05-25 图像处理方法及装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022247128A1 true WO2022247128A1 (fr) 2022-12-01

Family

ID=77325042

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/126096 WO2022247128A1 (fr) 2021-05-25 2021-10-25 Procédé et appareil de traitement d'image, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN113298091A (fr)
WO (1) WO2022247128A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298091A (zh) * 2021-05-25 2021-08-24 商汤集团有限公司 图像处理方法及装置、电子设备和存储介质
CN114255221A (zh) * 2021-11-30 2022-03-29 上海商汤智能科技有限公司 图像处理、缺陷检测方法及装置、电子设备和存储介质
CN114445892A (zh) * 2022-01-27 2022-05-06 北京百度网讯科技有限公司 图像检测方法和装置
WO2023211248A1 (fr) * 2022-04-28 2023-11-02 Samsung Electronics Co., Ltd. Procédé et dispositif électronique pour la détermination d'une attention globale optimale dans un modèle d'apprentissage profond

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3246875A2 (fr) * 2016-05-18 2017-11-22 Siemens Healthcare GmbH Procédé et système d'enregistrement d'image à l'aide d'un agent artificiel intelligent
CN111080629A (zh) * 2019-12-20 2020-04-28 河北工业大学 一种图像拼接篡改的检测方法
CN112070670A (zh) * 2020-09-03 2020-12-11 武汉工程大学 全局-局部分离注意力机制的人脸超分辨率方法及系统
CN112418351A (zh) * 2020-12-11 2021-02-26 天津大学 基于全局与局部上下文感知的零样本学习图像分类方法
CN112784764A (zh) * 2021-01-27 2021-05-11 南京邮电大学 一种基于局部与全局注意力机制的表情识别方法及系统
CN113298091A (zh) * 2021-05-25 2021-08-24 商汤集团有限公司 图像处理方法及装置、电子设备和存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229490B (zh) * 2017-02-23 2021-01-05 北京市商汤科技开发有限公司 关键点检测方法、神经网络训练方法、装置和电子设备
US11436725B2 (en) * 2019-11-15 2022-09-06 Arizona Board Of Regents On Behalf Of Arizona State University Systems, methods, and apparatuses for implementing a self-supervised chest x-ray image analysis machine-learning model utilizing transferable visual words
CN111369442B (zh) * 2020-03-10 2022-03-15 西安电子科技大学 基于模糊核分类与注意力机制的遥感图像超分辨重建方法
CN112529042B (zh) * 2020-11-18 2024-04-05 南京航空航天大学 一种基于双重注意力多示例深度学习的医学图像分类方法
CN112419184B (zh) * 2020-11-19 2022-11-04 重庆邮电大学 一种综合局部信息和全局信息的空间注意力图像去噪方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3246875A2 (fr) * 2016-05-18 2017-11-22 Siemens Healthcare GmbH Procédé et système d'enregistrement d'image à l'aide d'un agent artificiel intelligent
CN111080629A (zh) * 2019-12-20 2020-04-28 河北工业大学 一种图像拼接篡改的检测方法
CN112070670A (zh) * 2020-09-03 2020-12-11 武汉工程大学 全局-局部分离注意力机制的人脸超分辨率方法及系统
CN112418351A (zh) * 2020-12-11 2021-02-26 天津大学 基于全局与局部上下文感知的零样本学习图像分类方法
CN112784764A (zh) * 2021-01-27 2021-05-11 南京邮电大学 一种基于局部与全局注意力机制的表情识别方法及系统
CN113298091A (zh) * 2021-05-25 2021-08-24 商汤集团有限公司 图像处理方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN113298091A (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2022247128A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
US20210241117A1 (en) Method for processing batch-normalized data, electronic device and storage medium
CN110889469B (zh) 图像处理方法及装置、电子设备和存储介质
CN111783756B (zh) 文本识别方法及装置、电子设备和存储介质
WO2022247103A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage lisible par ordinateur
CN111881956B (zh) 网络训练方法及装置、目标检测方法及装置和电子设备
CN110909815B (zh) 神经网络训练、图像处理方法、装置及电子设备
CN111539410B (zh) 字符识别方法及装置、电子设备和存储介质
TWI759830B (zh) 網路訓練方法、圖像生成方法、電子設備及電腦可讀儲存介質
CN111581488A (zh) 一种数据处理方法及装置、电子设备和存储介质
CN110659690B (zh) 神经网络的构建方法及装置、电子设备和存储介质
WO2023098000A1 (fr) Procédé et appareil de traitement d'image, procédé et appareil de détection de défaut, dispositif électronique et support de stockage
CN109685041B (zh) 图像分析方法及装置、电子设备和存储介质
CN111242303A (zh) 网络训练方法及装置、图像处理方法及装置
CN111259967A (zh) 图像分类及神经网络训练方法、装置、设备及存储介质
CN112001364A (zh) 图像识别方法及装置、电子设备和存储介质
CN110633715B (zh) 图像处理方法、网络训练方法及装置、和电子设备
CN114168798B (zh) 文本存储管理与检索方法及装置
WO2023024439A1 (fr) Procédé et appareil de reconnaissance de comportement, dispositif électronique et support de stockage
WO2022141969A1 (fr) Procédé et appareil de segmentation d'image, dispositif électronique, support de stockage et programme
CN110070046B (zh) 人脸图像识别方法及装置、电子设备和存储介质
CN112002313B (zh) 交互方法及装置、音箱、电子设备和存储介质
CN111582265A (zh) 一种文本检测方法及装置、电子设备和存储介质
CN112749709A (zh) 图像处理方法及装置、电子设备和存储介质
CN114168809B (zh) 基于相似度的文档字符串编码匹配方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942692

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21942692

Country of ref document: EP

Kind code of ref document: A1