WO2024032585A1 - Data processing method and apparatus, neural network model, device, and medium - Google Patents

Data processing method and apparatus, neural network model, device, and medium Download PDF

Info

Publication number
WO2024032585A1
WO2024032585A1 PCT/CN2023/111669 CN2023111669W WO2024032585A1 WO 2024032585 A1 WO2024032585 A1 WO 2024032585A1 CN 2023111669 W CN2023111669 W CN 2023111669W WO 2024032585 A1 WO2024032585 A1 WO 2024032585A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
feature
channel
processed
dimension
Prior art date
Application number
PCT/CN2023/111669
Other languages
French (fr)
Chinese (zh)
Inventor
吴臻志
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2024032585A1 publication Critical patent/WO2024032585A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and in particular, to a data processing method and device, a neural network model, electronic equipment, and a computer-readable storage medium.
  • neural networks have been widely used in image processing, video processing, speech processing, text processing and other fields.
  • feature extraction is usually required, and data processing is performed based on the extracted features.
  • the present disclosure provides a data processing method and device, a neural network model, electronic equipment, and a computer-readable storage medium.
  • the present disclosure provides a data processing method.
  • the data processing method includes: inputting data to be processed into a target neural network, performing data processing based on the spatial attention mechanism of the squeeze and excitation framework, and obtaining processing results; wherein, The spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain the attention information of the features in the spatial dimension.
  • the present disclosure provides a neural network model, which is a model constructed based on the model parameters of a target neural network, wherein the target neural network adopts the target described in any one of the embodiments of the present disclosure.
  • Neural Networks are a model constructed based on the model parameters of a target neural network, wherein the target neural network adopts the target described in any one of the embodiments of the present disclosure.
  • the present disclosure provides a data processing device.
  • the data processing device includes: a data processing module for inputting data to be processed into a target neural network and performing data processing based on the spatial attention mechanism of the extrusion and excitation framework. Obtain processing results; wherein, the spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain attention information of the features in the spatial dimension.
  • the present disclosure provides an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be processed by the at least one processor.
  • One or more computer programs are executed by the processor, and the one or more computer programs are executed by the at least one processor, so that the at least one processor can execute the above-mentioned data processing method.
  • the present disclosure provides a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned data processing method when executed by a processor/processing core.
  • the data to be processed is input to the target neural network, so that the target neural network performs data processing based on the spatial attention mechanism of the squeeze and excitation framework, obtains the processing results, and realizes compression of features from the channel dimension, thereby It can reduce the size of features in the channel dimension and reduce the amount of data to be processed, thereby improving task processing efficiency.
  • the target neural network performs data processing based on the spatial attention mechanism of the squeeze and excitation framework, obtains the processing results, and realizes compression of features from the channel dimension, thereby It can reduce the size of features in the channel dimension and reduce the amount of data to be processed, thereby improving task processing efficiency.
  • the attention information of the feature in the spatial dimension can be obtained, thereby improving The accuracy of the processing results, thereby improving the accuracy of task processing.
  • Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure
  • Figure 2 is a flow chart of a data processing method provided by an embodiment of the present disclosure
  • Figure 3 is a schematic diagram of a target neural network provided by an embodiment of the present disclosure.
  • Figure 4 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure.
  • Figure 5 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure.
  • Figure 7 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure.
  • Figure 8 is a schematic diagram of a target neural network provided by an embodiment of the present disclosure.
  • Figure 9 is a schematic diagram of a neural network model provided by an embodiment of the present disclosure.
  • Figure 10 is a block diagram of a data processing device provided by an embodiment of the present disclosure.
  • Figure 11 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • Figure 12 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • the original data for example, pictures, speech, text, video, etc.
  • the original data is usually high-dimensional information, which contains more redundant information and may also be sparse data. Therefore, Processing based directly on the original data requires too much calculation and the task execution efficiency is low.
  • relatively low-dimensional feature data is obtained by extracting features from original data, and then data processing is performed based on the feature data to reduce the amount of calculation.
  • the amount of feature data is still large.
  • the amount of calculation is large, which may result in task execution efficiency still being unable to meet user needs.
  • embodiments of the present disclosure provide a data processing method and device, a neural network model, electronic equipment, and a computer-readable storage medium.
  • the data processing method according to the embodiment of the present disclosure can compress features from the channel dimension, thereby reducing the size of the features in the channel dimension, reducing the amount of data to be processed, thereby improving task processing efficiency, and at the same time, by stimulating the features that have been compressed by the channel.
  • the correlation in the spatial dimension can obtain the attention information of the feature in the spatial dimension, thereby improving the accuracy of the processing results and thus improving the accuracy of task processing.
  • the data processing method according to the embodiment of the present disclosure can be executed by an electronic device such as a terminal device or a server.
  • the terminal device can be a user equipment (User Equipment, UE), a mobile device, a terminal, a cellular phone, a cordless phone, or a personal digital assistant (Personal Digital Assistant). Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc., the method can be implemented by the processor calling computer-readable program instructions stored in the memory. Alternatively, the method can be executed via the server.
  • a first aspect of the embodiment of the present disclosure provides a data processing method.
  • Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure. Referring to Figure 1, the method includes the following steps.
  • step S11 the data to be processed is input into the target neural network, the data is processed based on the spatial attention mechanism of the squeeze and excitation framework, and the processing results are obtained.
  • the spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain the attention information of the features in the spatial dimension.
  • the Squeeze-and-Excitation (SE) framework can be implemented by the SE Network structure (SE Network, SENet).
  • SENet is mainly used to model the interdependence between channels of convolutional features to improve the representation effect of features in the channel dimension.
  • the amount of data processing is reduced by squeezing features in the spatial dimension, and channel attention information is obtained by stimulating the correlation between channels, the attention of features in the spatial dimension is also lost. information.
  • embodiments of the present disclosure provide a spatial attention mechanism based on a squeeze and excitation framework.
  • the squeeze and excitation framework acts in the spatial attention dimension and focuses on the attention information of features in the spatial dimension.
  • the corresponding implementation method In order to compress features in the channel dimension, the correlation in the spatial dimension is stimulated based on the channel compressed features to obtain Obtain the attention information of features in the spatial dimension.
  • the above spatial attention mechanism can be used to process at least one of image data, voice data, text data, and video data. That is, in step S11, the data to be processed may include at least one of image data, voice data, text data, and video data.
  • the target neural network processes the data based on the spatial attention mechanism of the squeeze and excitation framework to obtain corresponding processing results.
  • the target neural network can be used to perform at least one of image processing tasks, speech processing tasks, text processing tasks, and video processing tasks.
  • the processing results include at least one of image processing results, speech processing results, text processing results, and video processing results (where the processing may include operations such as identification, classification, and labeling), which are related to the type of data to be processed. , the content of the data to be processed, and the execution tasks of the target neural network, etc.
  • the embodiments of this disclosure place no restrictions on the tasks that the target neural network can perform, the corresponding data to be processed, and the processing results.
  • the data to be processed includes at least one image to be processed.
  • the image to be processed is input to the target neural network, it is processed through some network structures (for example, convolutional layers) ), extract the image features to be processed from the image to be processed.
  • the image features to be processed can include image features at different levels. The higher the level, the better the semantic representation effect of the image features. The lower the level, the better the spatial representation effect of the image features.
  • the target neural network compresses them from the channel dimension to obtain the image channel context features, and further stimulates the correlation of the image channel context features in the spatial dimension to obtain the image spatial attention features (which can characterize the spatial dimensions attention information), and then fuse the image spatial attention features with the corresponding image features to be processed, that is, obtain the target image features that can represent the spatial attention information.
  • image spatial attention features which can characterize the spatial dimensions attention information
  • fuse the image spatial attention features with the corresponding image features to be processed that is, obtain the target image features that can represent the spatial attention information.
  • the above target image features can be further used in image processing processes such as image classification, image annotation, and image recognition.
  • the data to be processed includes at least one image to be classified.
  • the image to be classified is input to the target neural network, it is processed through some network structures (for example, convolutional layers) ), extract the image features to be classified from the image to be classified.
  • the image features to be classified can include image features at different levels. The higher the level, the better the semantic representation effect of the image features. The lower the level, the better the spatial representation effect of the image features.
  • the target neural network compresses them from the channel dimension to obtain the image channel context features, and further stimulates the correlation of the image channel context features in the spatial dimension to obtain the image spatial attention features (which can characterize the spatial dimensions attention information), and then fuse the image spatial attention features with the corresponding image features to be classified, that is, obtain the target image features that can represent the spatial attention information.
  • the image classification results can be obtained through the corresponding image classification algorithm.
  • the images to be classified are essentially external technical data, and a series of technical processing implemented on them at least include: inputting the images to be classified into the target neural network, data processing based on the spatial attention mechanism of the squeeze and excitation framework, Get image classification results.
  • a series of technical means are adopted.
  • the technical operations in these technical means include but are not limited to data input, feature extraction, feature compression, feature conversion, etc.,
  • the above technical operations all correspond to corresponding technical features, and they are all computerized processing and operations of given data.
  • the characteristics of the images to be classified can be maintained as much as possible to ensure the accuracy of the image classification results. It can be seen from this that although the technical solutions of the embodiments of the present disclosure do not directly change the processing performance, operation speed, and calculation accuracy of the hardware device, it changes the resource overhead required by the hardware device when processing data such as images to be classified, thereby improving resources. Utilization. In other words, due to the reduction in resource overhead, some hardware devices that were originally unable to handle tasks such as image classification can perform corresponding data processing such as image classification based on the data processing methods of embodiments of the present disclosure, thereby reducing the requirements for hardware devices.
  • the above spatial attention mechanism corresponds to certain networks in the target neural network.
  • network layer or network structure and the corresponding network layer or network structure implements the above processing process.
  • the target neural network includes a spatial attention module.
  • the spatial attention module is a module built according to the spatial attention mechanism based on the squeeze and excitation framework. It is used to compress features from the channel dimension and excite the features that have been compressed by the channel. The correlation of features in the spatial dimension and the attention information of the features in the spatial dimension are obtained.
  • the spatial attention module itself can also include functional units with smaller granularity, and each functional unit has a certain connection relationship. Based on each functional unit and its connection relationship, the attention information of the feature in the spatial dimension can be obtained.
  • the spatial attention mechanism based on the squeeze and excitation framework is only one of the processing mechanisms used by the target neural network for data processing.
  • the target neural network can also perform data processing based on other data processing mechanisms.
  • This disclosure implements There is no restriction on this.
  • the spatial attention mechanism based on the squeeze and excitation framework processes the data to be processed. It can be a processing step in the entire data processing process, or it can correspond to the entire data process.
  • the target neural network may only include a spatial attention module, or may include other network layers or network structures in addition to the spatial attention module. This is not limited in the embodiments of the present disclosure.
  • the data to be processed is input to the target neural network, so that the target neural network performs data processing based on the spatial attention mechanism of the squeeze and excitation framework, obtains the processing results, and realizes compression of features from the channel dimension, so that it can Reduce the size of the feature in the channel dimension and reduce the amount of data to be processed, thereby improving task processing efficiency.
  • the target neural network performs data processing based on the spatial attention mechanism of the squeeze and excitation framework, obtains the processing results, and realizes compression of features from the channel dimension, so that it can Reduce the size of the feature in the channel dimension and reduce the amount of data to be processed, thereby improving task processing efficiency.
  • the attention information of the feature in the spatial dimension can be obtained, thereby improving processing The accuracy of the results, thereby improving the accuracy of task processing.
  • FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present disclosure. Referring to Figure 2, the method includes the following steps.
  • step S21 characteristics to be processed are determined based on the data to be processed.
  • step S22 feature compression is performed on the features to be processed from the channel dimension to obtain channel context features.
  • step S23 feature conversion is performed on the channel context features to obtain spatial attention features.
  • step S24 feature fusion is performed on the features to be processed and the spatial attention features to obtain the target features.
  • step S25 the processing result is determined based on the target characteristics.
  • the feature size to be processed, the channel context feature, the spatial attention feature and the target feature are the same in the spatial dimension.
  • the feature to be processed and the target feature have the same first feature size in the channel dimension.
  • the channel context feature and the spatial attention feature are the same. have the same second feature size in the channel dimension, and the first feature size is larger than the second feature size.
  • the features to be processed can be compressed in the channel dimension first, and then transformed based on the compressed features to obtain the attention information of the spatial dimension.
  • the attention information of the spatial dimension can be integrated into the data to be processed. features to obtain the target features.
  • the features to be processed may be features obtained based on the data to be processed, and may have a certain corresponding relationship with the data to be processed.
  • step S21 determining the features to be processed based on the data to be processed includes: directly performing feature extraction on the data to be processed to obtain the features to be processed; or, first performing some data processing operations on the data to be processed, Obtain the intermediate data, and then perform feature extraction on the intermediate data to obtain the features to be processed; or, first perform feature extraction on the data to be processed to obtain the initial features, and then perform some data processing operations on the initial features to obtain the features to be processed.
  • the spatial attention mechanism based on the extrusion and excitation framework can be used to compress the features to be processed in the channel dimension, and stimulate the correlation of the channel-compressed features in the spatial dimension to obtain the spatial dimension of the features to be processed.
  • the above processing process corresponds to step S22 to step S25 in the embodiment of the present disclosure.
  • step S22 is mainly used to compress the features to be processed in the channel dimension to reduce the amount of data processing.
  • the channel context feature is used to characterize the contextual relationship of the features to be processed in the channel dimension.
  • performing feature compression on the features to be processed from the channel dimension to obtain the channel context features includes: pooling the features to be processed in the channel dimension to obtain the channel context features.
  • pooling is an important concept in neural networks, and its essence is a downsampling processing method.
  • pooling processing Through pooling processing, the receptive field of the feature can be increased and the number of parameters can be reduced, while maintaining certain invariances of the feature (for example, rotation invariance, translation invariance, scaling invariance, etc.).
  • Common pooling processing methods include average pooling (Average Pooling), maximum pooling (Max Pooling) and global pooling (Global Pooling).
  • average pooling refers to averaging the feature points in the neighborhood
  • maximum pooling refers to taking the maximum value of the feature points in the neighborhood
  • global pooling refers to making the entire feature a window for pooling processing, which can be used Reduce the feature dimension
  • global pooling is usually used in combination with average pooling and max pooling (for example, global average pooling, global max pooling, etc.).
  • features can be pooled through various pooling functions.
  • performing pooling processing on the features to be processed in the channel dimension to obtain channel context features includes: performing average pooling processing on the features to be processed in the channel dimension to obtain channel context features with a feature scale of n1 in the channel dimension.
  • n1 the feature scale of n1 in the channel dimension.
  • 1 ⁇ n1 ⁇ N, N is the number of channels of features to be processed, and N ⁇ 1.
  • performing pooling processing on the features to be processed in the channel dimension to obtain channel context features includes: performing maximum pooling processing on the features to be processed in the channel dimension to obtain channel context features with a feature scale of n2 in the channel dimension.
  • n2 the number of channels of features to be processed.
  • n1 and n2 are smaller than N, after the above average pooling process or maximum pooling process, the feature scale of the feature to be processed in the channel dimension can be reduced (from N to n1 or from N to n2 ), the corresponding data volume is also reduced, thereby reducing task processing pressure.
  • n1 and n2 are both integers greater than 1, there is still room for reduction in the feature scale of the features to be processed in the channel dimension.
  • the features to be processed are added in the channel dimension. Dimensions are compressed to the greatest extent to minimize the amount of data processing.
  • the corresponding data processing amount is the minimum processing amount.
  • channel context features when obtaining channel context features based on pooling processing, the corresponding parameters are usually hyperparameters, and the channel context features obtained through pooling processing can retain limited channel context information.
  • convolution processing can also achieve feature size compression, and when performing convolution processing, various learnable parameters (for example, the weight of the convolution kernel) can be introduced as needed, so that the convolution-processed features can be While compressing, feature information can be better retained.
  • learnable parameters for example, the weight of the convolution kernel
  • channel context features are obtained through convolution processing so that the channel context features can better retain channel context information.
  • performing feature compression on the feature to be processed from the channel dimension to obtain the channel context feature includes: performing convolution processing on the feature to be processed in the channel dimension to obtain the channel context feature.
  • the feature scale of the channel context feature in the channel dimension can be adjusted.
  • performing convolution processing on the feature to be processed in the channel dimension to obtain the channel context feature includes: convolving the feature to be processed in the channel dimension to obtain the channel context feature with a feature scale of n3 in the channel dimension.
  • n3 the feature scale of n3 in the channel dimension.
  • 1 ⁇ n3 ⁇ N, N is the number of channels of features to be processed.
  • performing feature conversion on the channel context features to obtain spatial attention features includes: performing feature extraction on the channel context features in the spatial dimension to obtain the first intermediate features; The intermediate features are activated to obtain the second intermediate features; the second intermediate features are subjected to feature reduction processing to obtain the third intermediate features; the third intermediate features are activated to obtain the spatial attention features.
  • spatial attention features that can represent the attention information of the spatial dimension can be obtained, thus clarifying the processing objects (i.e., channel context features), processing dimensions (i.e., spatial dimensions), and requirements.
  • the acquired information i.e., attention information
  • the attention information of the channel context features in the spatial dimension can be obtained in any way, and the embodiment of the present disclosure does not limit this.
  • performing feature extraction on the channel context features in the spatial dimension to obtain the first intermediate features includes: performing a first convolution process on the channel context features in the spatial dimension to obtain the first intermediate features.
  • the first convolution corresponds to a convolution kernel
  • the size of the convolution kernel is 3*3 and the step size is 2.
  • the first convolution process does not change the feature scale of the channel context features in the channel dimension. It mainly squeezes the channel context features in the spatial dimension. Through this extrusion process, the amount of data processing can be reduced. However, considering the requirements for processing accuracy, in some optional implementations, the amount of data processing can be appropriately increased in exchange for higher processing accuracy.
  • Corresponding processing methods include using multiple convolution kernels to obtain the characteristics of multiple output channels, and then averaging in the channel dimension (Channel-mean) to improve processing accuracy.
  • performing feature extraction on the channel context features in the spatial dimension to obtain the first intermediate features includes: performing a second convolution process on the channel context features in the spatial dimension to obtain the fourth corresponding to multiple channels. Intermediate features; determine the average value of multiple fourth intermediate features in the channel dimension to obtain the first intermediate feature.
  • the second convolution process not only squeezes the channel context features in the spatial dimension, but also appropriately expands the channel dimension (that is, expands the number of channels), and finally obtains the first intermediate feature through channel averaging.
  • the second convolution corresponds to four convolution kernels, each convolution kernel corresponds to one channel, and the size of the convolution kernel is 7*7, the step size is 4, and the expansion coefficient is 2.
  • the channel context features are extended to four channels to obtain spatial attention information, and then based on the channel averaging method, the average value of the spatial attention information of the four channels is determined, thereby obtaining the first intermediate feature.
  • nonlinear activation processing can be performed on the first intermediate feature to increase the nonlinear characteristics of the network and obtain better processing results.
  • activation processing is performed on the first intermediate feature to obtain the second intermediate feature, including: performing nonlinear activation on the first intermediate feature based on a linear rectification (Rectified Linear Unit, ReLU) function to obtain the second intermediate feature. intermediate characteristics.
  • ReLU Rectified Linear Unit
  • the channel context features are convolved to obtain the first intermediate features, and after activation processing, the second intermediate features are obtained.
  • the output feature size i.e., the (an intermediate feature) usually becomes smaller.
  • the reduced feature needs to be restored to its original size (i.e., the feature size of the channel context) for further calculations.
  • This method uses enlarging the feature size to achieve feature resolution from small to small.
  • the operation of mapping from high resolution to large resolution is called upsampling.
  • Deconvolution (Transposed Convolution) processing is one of the implementation methods of upsampling.
  • a first deconvolution process may be performed on the second intermediate features to obtain spatial attention features with the same size as the channel context features.
  • performing feature reduction processing on the second intermediate feature to obtain the third intermediate feature includes: performing a first deconvolution process on the second intermediate feature to obtain the third intermediate feature.
  • the first deconvolution corresponds to a convolution kernel
  • the size of the convolution kernel is 3*3 and the step size is 2.
  • the corresponding processing method is to use multiple convolution kernels to obtain multiple output channels, and then pass the channel dimension take the average.
  • performing feature reduction processing on the second intermediate feature to obtain a third intermediate feature includes: performing a second deconvolution process on the second intermediate feature to obtain a fifth intermediate feature corresponding to multiple channels. ; Based on the average value of multiple fifth intermediate features in the channel dimension, the third intermediate feature is obtained.
  • each convolution kernel corresponds to one channel, and the size of the convolution kernel is 7*7, the step size is 4, and the expansion coefficient is 2.
  • the second intermediate feature is expanded to four channels, each channel corresponds to a fifth intermediate feature, and then based on the channel averaging method, the average of the four fifth intermediate features is determined, so that Obtain third intermediate features.
  • nonlinear activation processing can be performed on the third intermediate feature, and, in order to facilitate subsequent calculations, normalization processing is also required.
  • the nonlinear activation process and the normalization process can be implemented by a function that has both nonlinear activation and normalization functions, or they can be implemented by a nonlinear activation function and a normalization function respectively. This is not the case in the embodiment of the present disclosure. limit.
  • activating the third intermediate feature to obtain spatial attention features includes: performing nonlinear normalized activation of the third intermediate feature based on the Sigmoid function to obtain spatial attention. feature. Since the Sigmoid function has both nonlinear activation and normalization functions, spatial attention features can be obtained directly based on the Sigmoid function.
  • activation processing is performed on the third intermediate feature to obtain spatial attention features, including: nonlinear activation of the third intermediate feature based on the ReLU function, and activation of the third intermediate feature based on the normalized index (Softmax) function.
  • the nonlinear activation results are normalized to obtain spatial attention features.
  • the target features that can well represent the spatial attention information can be obtained.
  • step S24 performing feature fusion on the features to be processed and the spatial attention features to obtain the target features includes: adding the features to be processed and the spatial attention features point by point to obtain the target features.
  • step S24 performing feature fusion on the features to be processed and the spatial attention features to obtain the target features includes: multiplying the features to be processed and the spatial attention features point by point to obtain the target features.
  • step S25 the processing result can be determined based on the target characteristics.
  • determining the processing result based on the target characteristics includes: determining the processing result directly based on the target characteristics; or performing some data processing operations on the target characteristics to obtain the processing results.
  • the embodiment of the present disclosure implements the corresponding data processing method through the above steps.
  • the above method corresponds to some network layers or network structures in the target neural network, and these network layers or network structures implement the above data processing method.
  • FIG 3 is a schematic diagram of a target neural network provided by an embodiment of the present disclosure.
  • the target neural network includes: a first network structure, a spatial attention module and a second network structure, where the spatial attention module includes a context modeling unit, a conversion unit and a fusion unit.
  • the data to be processed is input into the target neural network
  • the first network structure located before the spatial attention module processes the data to be processed, obtains the features to be processed, and uses the features to be processed as the spatial attention module.
  • Input data for the force module uses the context modeling unit to compress the features to be processed from the channel dimension to obtain the channel context features, and input the channel context features into the conversion unit; the conversion unit performs feature conversion on the channel context features to obtain the spatial attention features.
  • the spatial attention features into the fusion unit; the fusion unit fuses the features to be processed and the spatial attention features through point-by-point multiplication or point-by-point addition to obtain the target features, and inputs the target features into the second network structure.
  • the second network structure performs corresponding data processing based on the target characteristics to obtain processing results.
  • the first network structure is used to perform the processing process of step S21
  • the context modeling unit is used to perform the processing process of step S22
  • the conversion unit is used to perform the processing process of step S21.
  • the fusion unit is used to perform the processing of step S24
  • the second network structure is used to perform the processing of step S25.
  • first network structure and the second network structure are abstract network structures, and their internal structures may be the same or different, and the embodiments of the present disclosure do not limit this. Further, in some optional implementations, the first network structure and the second network structure can be set according to task processing requirements, statistical data, experience and other information.
  • the first network structure may include any one or more of convolutional layers, pooling layers, connection layers, activation layers, etc.
  • the second network structure may also include convolutional layers, pooling layers, Any one or more of the network layers such as connection layer and activation layer.
  • FIG 3 only shows the framework structure of the target neural network relatively simply from the functional level.
  • each of the above network structures or modules can optionally be composed of more fine-grained functional units.
  • the spatial attention module in the target neural network is shown in Figure 4 various functional units.
  • FIG 4 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure.
  • the spatial attention module includes: a context modeling unit, a conversion unit and a fusion unit, where the conversion unit includes a feature extraction layer, a first activation layer, a feature reduction layer and a second activation layer.
  • the context modeling unit first compresses them from the channel dimension to obtain the channel context features, and inputs the channel context features into the feature extraction layer.
  • the feature extraction layer performs feature extraction on the channel context feature in the spatial dimension, obtains the first intermediate feature, and inputs the first intermediate feature into the first activation layer;
  • the first activation layer activates the first intermediate feature, and obtains the second intermediate features, and input the second intermediate features into the feature reduction layer;
  • the feature reduction layer performs feature reduction processing on the second intermediate features, obtains the third intermediate features, and inputs the third intermediate features into the second activation layer;
  • the second activation layer The third intermediate feature is activated to obtain spatial attention features, and the spatial attention features are input into the fusion unit;
  • the fusion unit performs feature fusion on the features to be processed and the spatial attention features by point-by-point multiplication or point-by-point addition. , obtain the target features, and output the target features so that other network structures of the target neural network can perform data
  • feature compression can be achieved by pooling processing
  • feature extraction can be achieved by convolution processing
  • feature restoration can be achieved by deconvolution processing
  • feature activation can be achieved by the corresponding activation function.
  • FIG. 5 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure.
  • the spatial attention module mainly includes: a global average pooling (GAP) layer, a first convolution layer, a ReLU activation layer, a first deconvolution layer, and a Sigmoid activation layer.
  • GAP global average pooling
  • the feature to be processed is a four-dimensional tensor (b, c, h1, w1), where b represents the number of features to be processed, c represents the number of channels of the feature to be processed, h1 and w1 respectively Represents the height and width of the feature to be processed.
  • the global average pooling layer After inputting the features to be processed into the spatial attention module, the global average pooling layer performs global average pooling on the features to be processed to obtain channel context features. Since it is a global pooling process, the channel context feature is a global channel context feature, and the tensor size corresponding to the channel context feature is (b, 1, h1, w1). In other words, the global average pooling layer will The number of channels of the feature to be processed is compressed from c to 1, but its feature size in the spatial dimension is not changed.
  • the first convolution layer After obtaining the channel context features, the first convolution layer performs convolution processing on the channel context features and extracts the first intermediate features.
  • the corresponding tensor size of the first intermediate features is (b, 1, h2, w2), and h2 ⁇ h1, w2 ⁇ w1.
  • the first convolution layer can use a convolution kernel to squeeze the channel context in the spatial dimension (the height is squeezed from h1 to h2, and the width is squeezed from w1 to w2) to obtain the first intermediate size with a smaller spatial dimension. feature.
  • the ReLU activation layer performs nonlinear activation processing on the first intermediate feature to obtain the second intermediate feature, and the corresponding tensor size is (b, 1, h2, w2).
  • the first deconvolution layer After obtaining the second intermediate feature, the first deconvolution layer performs deconvolution processing on the second intermediate feature to achieve its expansion in the spatial dimension, and obtain the third intermediate feature.
  • the corresponding tensor size is (b, 1, h1 ,w1).
  • the height of the second intermediate feature in the spatial dimension is expanded from h2 to h1, and the width is expanded from w2 to w1, while keeping the number of channels unchanged.
  • the third intermediate feature is nonlinearly activated and normalized through the Sigmoid activation layer to obtain the spatial attention feature, and the corresponding tensor size is (b, 1, h1, w1).
  • the spatial attention features are multiplied point by point to the features to be processed to obtain the target features.
  • the corresponding tensor size is (b, 1, h1, w1).
  • the target feature is a feature that incorporates spatial attention information.
  • the first convolution layer corresponds to a convolution kernel, and the convolution kernel size is 3*3, the step size is 2, and the first deconvolution corresponds to a convolution Kernel, and the size of the convolution kernel is 3*3, and the step size is 2.
  • the features to be processed are input into the spatial attention module, and the global average pooling layer aggregates the information of all channel dimensions element by element to obtain channel context features with a shape of (b, 1, 28, 28); then the first volume
  • the product layer performs Conv3*3 convolution processing with a step size of 2 to achieve feature compression in the spatial dimension and obtain the first intermediate feature with a shape of (b, 1, 14, 14);
  • the first intermediate feature is Nonlinear activation processing of features to obtain the second intermediate feature with a shape of (b, 1, 14, 14);
  • the first deconvolution layer performs TransposedConv3*3, deconvolution processing with a step size of 2, and is implemented in Through feature expansion in the spatial dimension, the third intermediate feature with a shape of (b, 1, 28, 28) is obtained; after nonlinear activation and normalization of the third intermediate feature by the Sigmoid activation layer, the shape is (b, 1 ,28,28) spatial attention features; multiply the spatial attention features point by point into the features to be processed to obtain the
  • the number of channels is only 1, and the subsequent Conv3*3 and TransposedConv3*3 require less calculation (based on the features to be processed (b, 256, 28, 28 ) for example, if the number of channels is reduced by 256 times, the amount of calculation will be reduced by at least 256 times), which can effectively improve task processing efficiency.
  • channel extrusion and excitation based on SENet are usually based on full connection of channels, that is, each output channel is calculated from all input channels regardless of whether the output channel changes or not.
  • each pixel in the target feature is calculated from only 3*3 pixels in the feature to be processed, and the receptive field size is about 7*7 (the convolution kernel size is 3*3, the step size Calculated as 2), resulting in a relatively limited range of spatial attention.
  • the size of the above-mentioned receptive field can be appropriately increased to expand the scope of spatial attention, thereby improving task processing accuracy. It should be understood that increasing the size of the receptive field usually leads to an increase in the amount of calculation. In other words, the increase in the accuracy of the above task processing may be achieved at the expense of some processing capabilities.
  • the spatial attention module shown in Figure 5 is regarded as a simple version. By improving or strengthening some of the network layers, an enhanced version with a larger spatial attention range can be obtained. Spatial attention module.
  • the global average pooling layer in Figure 5 is replaced by a global convolution layer, and the learnable parameters of the convolution are used to better retain the information of the features to be processed.
  • the first convolution layer in Figure 5 is replaced with a second convolution layer and a first channel average layer, where the second convolution layer uses a larger convolution kernel and corresponds to multiple output channels.
  • the first channel averaging layer averages the features of each output channel to obtain the corresponding first intermediate features.
  • the first deconvolution layer in Figure 5 is replaced with a second deconvolution layer and a second channel average layer, where the second deconvolution layer can use a larger convolution kernel, and corresponds to For multiple output channels, the second channel averaging layer averages the features of each output channel to obtain the corresponding third intermediate features.
  • the second deconvolution layer can use a larger convolution kernel, and corresponds to For multiple output channels, the second channel averaging layer averages the features of each output channel to obtain the corresponding third intermediate features.
  • any one or more of the above improvements can be implemented on the basis of the simple version of the spatial attention module in order to increase the scope of spatial attention.
  • FIG. 6 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure, which is an enhanced version of the spatial attention module.
  • the spatial attention module includes: a global convolution layer, a feature extraction layer, a ReLU activation layer, a feature reduction layer and a Sigmoid activation layer, where the feature extraction layer includes a second convolution layer and a first channel average layer, The feature reduction layer includes a second deconvolution layer and a second channel averaging layer.
  • the feature to be processed is a four-dimensional tensor (b, c1, h1, w1), where b represents the number of features to be processed, c1 represents the number of channels of the feature to be processed, h1 and w1 respectively Represents the height and width of the feature to be processed.
  • the global convolution layer After inputting the features to be processed into the spatial attention module, the global convolution layer first performs global convolution processing on the features to be processed to obtain the channel context features. Since it is a global convolution process, the channel context feature is a global channel context feature, and the tensor size corresponding to the channel context feature is (b, 1, h1, w1). In other words, the global convolution layer will wait for The number of channels processing features is compressed from c1 to 1, without changing its feature size in the spatial dimension. At the same time, the feature information is better retained through the learnable parameters in the global convolution layer.
  • the channel context features are processed through a feature extraction layer composed of a second convolution layer and a first channel average layer to obtain the first intermediate features.
  • the second convolution layer performs convolution processing on the channel context features to obtain the fourth intermediate features of multiple channels
  • the first channel average layer calculates the average value of the multiple fourth intermediate features in the channel dimension, Get the first intermediate feature.
  • the tensor size corresponding to the fourth intermediate feature of multiple channels is (b, c2, h3, w3)
  • the tensor size corresponding to the first intermediate feature is (b, 1, h3, w3), 1 ⁇ c2 ⁇ c1, h3 ⁇ h1, w3 ⁇ w1.
  • the second convolutional layer obtains the fourth intermediate feature of c2 channels based on the channel context feature of 1 channel, and at the same time squeezes the channel context feature in the spatial dimension (the height is squeezed from h1 to h3, and the width is squeezed from w1 The pressure is w3); the first channel averaging layer calculates the average of the above-mentioned plurality of fourth intermediate features in the channel dimension to obtain the first intermediate feature.
  • the ReLU activation layer performs nonlinear activation processing on the first intermediate feature to obtain the second intermediate feature, and the corresponding tensor size is (b, 1, h3, w3).
  • the second intermediate feature is processed through a feature reduction layer composed of a second deconvolution layer and a second channel average layer to obtain a third intermediate feature.
  • the second intermediate feature is deconvolved by the second deconvolution layer to obtain the fifth intermediate feature of multiple channels, and then the second channel average layer is used to calculate the channel dimension of the multiple fifth intermediate features. average to obtain the third intermediate feature.
  • the tensor size corresponding to the fifth intermediate feature of multiple channels is (b, c3, h1, w1)
  • the third intermediate feature The corresponding tensor size is (b, 1, h1, w1), 1 ⁇ c3 ⁇ c1, and c3 and c2 can be the same or different.
  • the second deconvolution layer obtains the fifth intermediate feature of c3 channels based on the second intermediate feature of 1 channel, and at the same time expands the second intermediate feature in the spatial dimension (the height is expanded from h3 to h1, and the width is expanded from w3 Expanded to w1); the second channel averaging layer calculates the average of the above-mentioned plurality of fifth intermediate features in the channel dimension to obtain the third intermediate feature.
  • the third intermediate feature is nonlinearly activated and normalized through the Sigmoid activation layer to obtain the spatial attention feature, and the corresponding tensor size is (b, 1, h1, w1).
  • the spatial attention features are multiplied point by point to the features to be processed to obtain the target features.
  • the corresponding tensor size is (b, 1, h1, w1).
  • the second convolution layer corresponds to four convolution kernels, each convolution kernel corresponds to a channel, and the convolution kernel size is 7*7 and the step size is 4 , the expansion coefficient is 2, the second deconvolution corresponds to four convolution kernels, each convolution kernel corresponds to a channel, and the size of the convolution kernel is 7*7, the step size is 4, and the expansion coefficient is 2, as an example illustrate.
  • the features to be processed are input into the spatial attention module, and the global convolution layer performs feature compression on the features to be processed in the channel dimension to obtain channel context features with a shape of (b, 1, 28, 28); then the second convolution is used to compress the features to be processed in the channel dimension.
  • the layer uses convolution kernels corresponding to 4 channels to perform Conv7*7 convolution processing with a step size of 4 and an expansion coefficient of 2.
  • the channel context features are expanded to 4 channels in the channel dimension, and feature compression is performed in the spatial dimension.
  • the first channel average layer is in the channel dimension
  • Calculate the average of the above-mentioned fourth intermediate feature that is, calculate the average of the feature points at the same spatial position in the four channels) to obtain the first intermediate feature, with the corresponding size (b, 1, 7, 7); after ReLU activation
  • the layer performs nonlinear activation processing on the first intermediate feature to obtain the second intermediate feature with a shape of (b, 1, 7, 7); then the second deconvolution layer uses the convolution kernel corresponding to the 4 channels to perform TransposedConv7 *7, deconvolution processing with a step size of 4 and an expansion coefficient of 2, expand the second
  • a spatial attention feature with a shape of (b, 1, 28, 28) is obtained; the spatial attention feature is multiplied point by point into the features to be processed, and a target with a shape of (b, 256, 28, 28) is obtained. feature.
  • the scope of the spatial self-attention is about 7*7, which is The spatial dimension of the feature is small compared to 28*28; during the processing of the enhanced version of the spatial self-attention module, the expansion coefficient is 2, the step size is 4, and the convolution kernel with a size of 7*7 has a receptive field in About 53*53.
  • the spatial attention module shown in Figure 4-6 above corresponds to the SE Block structure in SENet. Based on the spatial attention module shown in Figure 3, a variety of deformation processes can be performed to obtain information about SENet A variety of variant structures, these variants can also be used to process the spatial attention mechanism to obtain spatial attention information.
  • Figure 7 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure, which belongs to one of the variants of SENet (ie Simplified NL Block).
  • the tensor size corresponding to the feature to be processed is (b, c, h1, w1)
  • the context modeling unit includes a third convolution layer and a normalization layer, which is used to Perform feature compression on the features to be processed from the channel dimension to obtain the channel context features.
  • the corresponding tensor size is (b, 1, h1, w1); the fourth convolution layer corresponds to the conversion unit, which is used to perform feature conversion on the channel context features.
  • the fusion unit is implemented by a point-by-point multiplier, which is used to multiply the spatial attention features into the features to be processed point by point to obtain the target Features, the tensor size of the target feature is (b, c, h1, w1).
  • SENet global context modeling framework
  • GC Block global context modeling framework
  • the processing process is similar , both of which first compress the features to be processed from the channel dimension to obtain the channel context features, then stimulate the channel context features in the spatial dimension to obtain the spatial attention features, and finally fuse the spatial attention features with the features to be processed to obtain the target feature.
  • the above spatial attention mechanism can be used in combination with the channel attention mechanism to further improve the accuracy of task processing results.
  • the processing result may be determined by attention information in the spatial dimension and attention information in the channel dimension.
  • the attention information in the spatial dimension is obtained based on the spatial attention mechanism
  • the attention information in the channel dimension is obtained based on Channel attention mechanism is obtained.
  • combining the two mechanisms can simultaneously obtain the attention information of the spatial dimension and the attention information of the channel dimension, and the representation effect of the features can be further improved, and correspondingly, the accuracy of task processing results can also be improved.
  • Figure 8 is a schematic diagram of a target neural network provided by an embodiment of the present disclosure.
  • the target neural network includes a first network structure, a spatial attention module, a channel attention module, a fusion module and a second network structure, where the spatial attention module is a module set based on the spatial attention mechanism for To obtain attention information in the spatial dimension, the channel attention module is a module set based on the channel attention mechanism and is used to obtain attention information in the channel dimension.
  • the data to be processed is input into the target neural network, and the first network structure located before the spatial attention module and the channel attention module processes the data to be processed, obtains the features to be processed, and adds the features to be processed.
  • the features are processed as input data to the spatial attention module and the channel attention module.
  • the spatial attention module For the spatial attention module, it performs feature compression on the features to be processed from the channel dimension through the first context modeling unit, obtains the channel context features, and inputs the channel context features into the first conversion unit; the first conversion unit The features are transformed into features to obtain spatial attention features, and the spatial attention features are input into the first fusion unit; the first fusion unit combines the features to be processed with the spatial attention features by point-by-point multiplication or point-by-point addition. Fusion, obtain the first target feature, and input the first target feature into the fusion module.
  • the channel attention module performs feature compression on the features to be processed from the spatial dimension through the second context modeling unit, obtains the spatial context features, and inputs the spatial context features into the second conversion unit; the second conversion unit The spatial context features are transformed into features to obtain channel attention features, and the channel attention features are input into the second fusion unit; the second fusion unit multiplies the features to be processed and the channel attention features by point-by-point multiplication or point-by-point addition. Perform feature fusion to obtain the second target feature, and input the second target feature into the fusion module.
  • the fusion module further fuses the first target feature and the second target feature to obtain both spatial attention information and
  • the fusion feature of the channel attention information is input into the second network structure, and the second network structure performs corresponding data processing based on the fusion feature to obtain the processing result.
  • the spatial attention module and the channel attention module act on the same feature to be processed to simultaneously enhance its feature expression effect from the spatial dimension and the channel dimension.
  • the spatial attention module and the channel attention module can also act on different features to be processed, so as to adopt different processing methods for different features to be processed.
  • the spatial attention module acts on the first feature to be processed to obtain the attention information of the first feature to be processed in the spatial dimension; the channel attention module acts on the second feature to be processed to obtain the second feature to be processed. Process the attentional information of features in the channel dimension.
  • the spatial attention module includes a naive version and an enhanced version.
  • the channel attention module can also include a naive version and an enhanced version.
  • the naive versions of both can be used at the same time, or the enhanced versions of both can be used at the same time, or the naive version of one of them can be used and the enhanced version of the other can be used.
  • the embodiments of the present disclosure are suitable for This is not a restriction.
  • the spatial attention module and the channel attention module can use various SENet (including corresponding variants) structures, and the spatial attention module and the channel attention module can use the same SENet structure or different ones. SENet structure, the embodiment of the present disclosure does not limit this.
  • the processing processes of the spatial attention module and the channel attention module are relatively independent, and data processing can be performed without relying on the processing results of the other party. Therefore, the spatial attention module and the channel attention module
  • the processing procedures of the modules can be executed simultaneously or sequentially, and the embodiments of the present disclosure do not limit this.
  • the above-mentioned spatial attention module and channel attention module can be carried by different hardware devices, or can be carried by the same hardware device.
  • the processing processes of the spatial attention module and the channel attention module can be executed sequentially, or by establishing two processes, they can be executed simultaneously in the two processes.
  • the processing processes of the spatial attention module and the channel attention module can be executed by the respective hardware devices at the same time, or can be executed in sequence.
  • a second aspect of the embodiment of the present disclosure provides a neural network model.
  • Figure 9 is a schematic diagram of a neural network model provided by an embodiment of the present disclosure.
  • the neural network model is a model constructed based on the model parameters of the target neural network, wherein the target neural network adopts the target neural network in any one of the embodiments of the present disclosure.
  • the neural network model can be used to perform at least one of image processing tasks, speech processing tasks, text processing tasks, and video processing tasks. No matter what kind of task the neural network model performs, it needs to obtain the attention information of the features in the spatial dimension during the execution process. Based on this, the neural network model includes the following steps during the execution of the task: compress the features from the channel dimension, Excite the correlation of the channel-compressed features in the spatial dimension to obtain the attention information of the features in the spatial dimension.
  • the structure of the neural network model may be different for different types of tasks, but no matter how its structure changes, it includes functional modules for executing the spatial attention mechanism.
  • an initial neural network model is built based on the task to be processed.
  • the model parameters are initial parameters.
  • the task to be processed is executed directly based on the initial neural network model, the task Processing accuracy is low.
  • the model parameters of the target neural network are used to update the corresponding parameters in the initial neural network model to obtain a neural network model with higher accuracy.
  • the process of building a neural network model based on the model parameters of the target neural network can be implemented through model training.
  • each model parameter is an initialization parameter set based on experience, statistical data, or randomly set. This initial model cannot be directly used to perform tasks.
  • obtain the corresponding training set train the initial neural network model based on the training set, and obtain the training results. Then, according to the training results The result and the preset iteration conditions determine whether to continue training the model. If it is determined to continue training the model, it means that the current model parameters have not reached the optimum and there is room for continued optimization. Therefore, the model is updated according to the results of this round of training. parameters, and iteratively train the updated model based on the training set until it is determined to stop training the model, thereby obtaining a trained neural network model.
  • the model parameters correspond to the model parameters of the target neural network.
  • model verification and correction can also be performed based on the verification set.
  • model evaluation can also be performed based on the test set.
  • the embodiments of the present disclosure improve the neural network model. There is no restriction on the acquisition method.
  • the disclosure also provides data processing devices, electronic equipment, and computer-readable storage media, all of which can be used to implement any of the data processing methods provided by the disclosure.
  • data processing devices electronic equipment, and computer-readable storage media, all of which can be used to implement any of the data processing methods provided by the disclosure.
  • Figure 10 is a block diagram of a data processing device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a data processing device, which includes the following modules.
  • the data processing module 101 is used to input the data to be processed into the target neural network, perform data processing based on the spatial attention mechanism of the squeeze and excitation framework, and obtain processing results.
  • the spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain the attention information of the features in the spatial dimension.
  • the data processing module includes a first determination sub-module, a compression sub-module, a conversion sub-module, a fusion sub-module and a second determination sub-module.
  • the first determination sub-module is used to determine the features to be processed based on the data to be processed;
  • the compression sub-module is used to compress the features to be processed from the channel dimension to obtain channel context features;
  • the conversion sub-module is used to for performing feature conversion on the channel context features to obtain spatial attention features; a fusion submodule for feature fusion of the to-be-processed features and the spatial attention features to obtain target features;
  • a second determination submodule Used to determine the processing result according to the target feature; wherein the feature size to be processed, the channel context feature, the spatial attention feature and the target feature in the spatial dimension are the same, and the feature to be processed is
  • the channel context feature and the spatial attention feature have the same first feature size in the channel dimension, and the channel context feature and the spatial attention feature have the same second feature
  • the above functional sub-modules are mapped to the target neural network shown in Figure 3.
  • the first determination sub-module corresponds to the first network structure
  • the compression sub-module corresponds to the context modeling unit
  • the conversion sub-module corresponds to the conversion unit
  • the fusion sub-module corresponds to the fusion unit
  • the second determination sub-module corresponds to the second network structure.
  • the compression sub-module, conversion sub-module, and fusion sub-module also include more fine-grained functional units.
  • relevant content please refer to the corresponding descriptions of the embodiments of the present disclosure, and will not be repeated here.
  • Figure 11 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides an electronic device, which includes: at least one processor 1101; at least one memory 1102, and one or more I/O interfaces 1103 connected between the processor 1101 and the memory 1102 among them, the memory 1102 stores one or more computer programs that can be executed by at least one processor 501, and the one or more computer programs are executed by at least one processor 1101, so that at least one processor 1101 can execute the above-mentioned Data processing methods.
  • Figure 12 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides an electronic device.
  • the electronic device includes multiple processing cores 1201 and an on-chip network 1202.
  • the multiple processing cores 1201 are connected to the on-chip network 1202.
  • the on-chip network 1202 is used to interact with multiple processing cores.
  • One handles inter-core data and external data.
  • one or more instructions are stored in one or more processing cores 1201, and one or more instructions are processed by one or more processing cores 1201. 1201 is executed so that one or more processing cores 1201 can execute the above data processing method.
  • the electronic device may be a brain-like chip, because the brain-like chip can adopt a vectorized calculation method and needs to be loaded into the neural network through an external memory such as a double data rate (Double Data Rate, DDR) synchronous dynamic random access memory. Model weight information and other parameters. Therefore, the operation efficiency of batch processing in the embodiments of the present disclosure is relatively high.
  • DDR Double Data Rate
  • Embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned data processing method when executed by a processor/processing core.
  • Computer-readable storage media may be volatile or non-volatile computer-readable storage media.
  • Embodiments of the present disclosure also provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code.
  • computer readable code When the computer readable code is stored in a processor of an electronic device, When running, the processor in the electronic device executes the above data processing method.
  • computer storage media includes volatile and non-volatile media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data. lossless, removable and non-removable media.
  • Computer storage media include, but are not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), static random access memory (SRAM), flash memory or other memory technology, portable Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassette, magnetic tape, disk storage or other magnetic storage device, or that can be used to store the desired information and can be accessed by a computer any other media.
  • communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery medium.
  • Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
  • Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages.
  • the computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect).
  • LAN local area network
  • WAN wide area network
  • an external computer such as an Internet service provider through the Internet. connect
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA)
  • the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.
  • the computer program product described here may be implemented specifically through hardware, software, or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a Software Development Kit (SDK), etc. wait.
  • SDK Software Development Kit
  • These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
  • These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more components for implementing the specified logical function(s).
  • Executable instructions may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
  • Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a data processing method and apparatus, a neural network model, a device, and a medium. The data processing method comprises: inputting data to be processed into a target neural network, and performing data processing on the basis of a spatial attention mechanism of a squeeze-and-excitation framework, so as to obtain a processing result, wherein the spatial attention mechanism is used for compressing features in a channel dimension, and exciting the relevance, in a spatial dimension, of the features which have been subjected to channel compression, so as to obtain attention information of the features in the spatial dimension.

Description

数据处理方法及装置、神经网络模型、设备、介质Data processing methods and devices, neural network models, equipment, media 技术领域Technical field
本公开实施例涉及计算机技术领域,尤其涉及一种数据处理方法及装置、神经网络模型、电子设备、计算机可读存储介质。The embodiments of the present disclosure relate to the field of computer technology, and in particular, to a data processing method and device, a neural network model, electronic equipment, and a computer-readable storage medium.
背景技术Background technique
神经网络等技术已经广泛应用在图像处理、视频处理、语音处理以及文本处理等领域中。在基于神经网络执行相应的任务时,通常需要进行特征提取,并基于提取的特征进行数据处理。Technologies such as neural networks have been widely used in image processing, video processing, speech processing, text processing and other fields. When performing corresponding tasks based on neural networks, feature extraction is usually required, and data processing is performed based on the extracted features.
发明内容Contents of the invention
本公开提供一种数据处理方法及装置、神经网络模型、电子设备、计算机可读存储介质。The present disclosure provides a data processing method and device, a neural network model, electronic equipment, and a computer-readable storage medium.
第一方面,本公开提供了一种数据处理方法,该数据处理方法包括:将待处理数据输入目标神经网络,基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果;其中,所述空间注意力机制用于从通道维度对特征进行压缩,激励经过通道压缩的特征在空间维度的关联性,获得所述特征在空间维度的注意力信息。In a first aspect, the present disclosure provides a data processing method. The data processing method includes: inputting data to be processed into a target neural network, performing data processing based on the spatial attention mechanism of the squeeze and excitation framework, and obtaining processing results; wherein, The spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain the attention information of the features in the spatial dimension.
第二方面,本公开提供了一种神经网络模型,该神经网络模型是基于目标神经网络的模型参数构建的模型,其中,所述目标神经网络采用本公开实施例中任一项所述的目标神经网络。In a second aspect, the present disclosure provides a neural network model, which is a model constructed based on the model parameters of a target neural network, wherein the target neural network adopts the target described in any one of the embodiments of the present disclosure. Neural Networks.
第三方面,本公开提供了一种数据处理装置,该数据处理装置包括:数据处理模块,用于将待处理数据输入目标神经网络,基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果;其中,所述空间注意力机制用于从通道维度对特征进行压缩,激励经过通道压缩的特征在空间维度的关联性,获得所述特征在空间维度的注意力信息。In a third aspect, the present disclosure provides a data processing device. The data processing device includes: a data processing module for inputting data to be processed into a target neural network and performing data processing based on the spatial attention mechanism of the extrusion and excitation framework. Obtain processing results; wherein, the spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain attention information of the features in the spatial dimension.
第四方面,本公开提供了一种电子设备,该电子设备包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的一个或多个计算机程序,一个或多个所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的数据处理方法。In a fourth aspect, the present disclosure provides an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be processed by the at least one processor. One or more computer programs are executed by the processor, and the one or more computer programs are executed by the at least one processor, so that the at least one processor can execute the above-mentioned data processing method.
第五方面,本公开提供了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序在被处理器/处理核执行时实现上述的数据处理方法。In a fifth aspect, the present disclosure provides a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned data processing method when executed by a processor/processing core.
在本公开实施例中,将待处理数据输入到目标神经网络,使得目标神经网络基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果,实现从通道维度对特征的压缩,从而可以降低特征在通道维度的尺寸,减少待处理数据量,从而提升任务处理效率,同时,通过激励经过通道压缩的特征在空间维度的关联性,能够获得特征在空间维度的注意力信息,从而提高处理结果的准确性,进而提升任务处理的准确率。In the embodiment of the present disclosure, the data to be processed is input to the target neural network, so that the target neural network performs data processing based on the spatial attention mechanism of the squeeze and excitation framework, obtains the processing results, and realizes compression of features from the channel dimension, thereby It can reduce the size of features in the channel dimension and reduce the amount of data to be processed, thereby improving task processing efficiency. At the same time, by stimulating the correlation of the channel-compressed features in the spatial dimension, the attention information of the feature in the spatial dimension can be obtained, thereby improving The accuracy of the processing results, thereby improving the accuracy of task processing.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments with reference to the accompanying drawings.
附图说明Description of drawings
图1为本公开实施例提供的一种数据处理方法的流程图;Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure;
图2为本公开实施例提供的一种数据处理方法的流程图;Figure 2 is a flow chart of a data processing method provided by an embodiment of the present disclosure;
图3为本公开实施例提供的一种目标神经网络的示意图;Figure 3 is a schematic diagram of a target neural network provided by an embodiment of the present disclosure;
图4为本公开实施例提供的一种空间注意力模块的示意图; Figure 4 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure;
图5为本公开实施例提供的一种空间注意力模块的示意图;Figure 5 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure;
图6为本公开实施例提供的一种空间注意力模块的示意图;Figure 6 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种空间注意力模块的示意图;Figure 7 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure;
图8为本公开实施例提供的一种目标神经网络的示意图;Figure 8 is a schematic diagram of a target neural network provided by an embodiment of the present disclosure;
图9为本公开实施例提供的一种神经网络模型的示意图;Figure 9 is a schematic diagram of a neural network model provided by an embodiment of the present disclosure;
图10为本公开实施例提供的一种数据处理装置的框图;Figure 10 is a block diagram of a data processing device provided by an embodiment of the present disclosure;
图11为本公开实施例提供的一种电子设备的框图;Figure 11 is a block diagram of an electronic device provided by an embodiment of the present disclosure;
图12为本公开实施例提供的一种电子设备的框图。Figure 12 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本公开,而非对本公开的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本公开相关的部分而非全部结构。The present disclosure will be further described in detail below in conjunction with the accompanying drawings and examples. It can be understood that the specific embodiments described here are only used to explain the present disclosure, but not to limit the present disclosure. It should also be noted that, for convenience of description, only some but not all structures related to the present disclosure are shown in the drawings.
在执行各类任务时所依据的原始数据(例如,图片、语音、文本、视频等数据)通常是高维信息,其包含了较多的冗余信息,还有可能是稀疏性数据,因此,直接基于原始数据进行处理,计算量过大,任务的执行效率较低。The original data (for example, pictures, speech, text, video, etc.) used when performing various tasks is usually high-dimensional information, which contains more redundant information and may also be sparse data. Therefore, Processing based directly on the original data requires too much calculation and the task execution efficiency is low.
基于此,在相关技术中,通过从原始数据中提取特征的方式,获得维度相对较低的特征数据,进而基于特征数据进行数据处理,以降低计算量。但是,在部分场景中,特征的数据量仍然较大,直接基于特征数据进行数据处理时,计算量较大,可能导致任务的执行效率仍然无法满足用户需求。Based on this, in related technologies, relatively low-dimensional feature data is obtained by extracting features from original data, and then data processing is performed based on the feature data to reduce the amount of calculation. However, in some scenarios, the amount of feature data is still large. When data processing is performed directly based on feature data, the amount of calculation is large, which may result in task execution efficiency still being unable to meet user needs.
有鉴于此,本公开实施例提供一种数据处理方法及装置、神经网络模型、电子设备、计算机可读存储介质。根据本公开实施例的数据处理方法,能够从通道维度对特征的压缩,从而可以降低特征在通道维度的尺寸,减少待处理数据量,从而提升任务处理效率,同时,通过激励经过通道压缩的特征在空间维度的关联性,能够获得特征在空间维度的注意力信息,从而提高处理结果的准确性,进而提升任务处理的准确率。In view of this, embodiments of the present disclosure provide a data processing method and device, a neural network model, electronic equipment, and a computer-readable storage medium. The data processing method according to the embodiment of the present disclosure can compress features from the channel dimension, thereby reducing the size of the features in the channel dimension, reducing the amount of data to be processed, thereby improving task processing efficiency, and at the same time, by stimulating the features that have been compressed by the channel The correlation in the spatial dimension can obtain the attention information of the feature in the spatial dimension, thereby improving the accuracy of the processing results and thus improving the accuracy of task processing.
根据本公开实施例的数据处理方法可以由终端设备或服务器等电子设备执行,终端设备可以为用户设备(User Equipment,UE)、移动设备、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等,该方法可以通过处理器调用存储器中存储的计算机可读程序指令的方式来实现。或者,可通过服务器执行该方法。The data processing method according to the embodiment of the present disclosure can be executed by an electronic device such as a terminal device or a server. The terminal device can be a user equipment (User Equipment, UE), a mobile device, a terminal, a cellular phone, a cordless phone, or a personal digital assistant (Personal Digital Assistant). Assistant, PDA), handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc., the method can be implemented by the processor calling computer-readable program instructions stored in the memory. Alternatively, the method can be executed via the server.
本公开实施例第一方面提供一种数据处理方法。A first aspect of the embodiment of the present disclosure provides a data processing method.
图1为本公开实施例提供的一种数据处理方法的流程图。参照图1,该方法包括如下步骤。Figure 1 is a flow chart of a data processing method provided by an embodiment of the present disclosure. Referring to Figure 1, the method includes the following steps.
在步骤S11中,将待处理数据输入目标神经网络,基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果。In step S11, the data to be processed is input into the target neural network, the data is processed based on the spatial attention mechanism of the squeeze and excitation framework, and the processing results are obtained.
其中,空间注意力机制用于从通道维度对特征进行压缩,激励经过通道压缩的特征在空间维度的关联性,获得特征在空间维度的注意力信息。Among them, the spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain the attention information of the features in the spatial dimension.
在一些可选的实现方式中,挤压与激励(Squeeze-and-Excitation,SE)框架可以由SE网络结构(SE Network,SENet)实现。在相关技术中,SENet主要用于对卷积特性的通道之间的相互依赖关系进行建模,以提高特征在通道维度的表征效果。在该处理过程中,虽然通过在空间维度的特征挤压,降低了数据处理量,并通过激励通道之间的关联性获得了通道注意力信息,但是同时也丢失了特征在空间维度的注意力信息。有鉴于此,本公开实施例提供一种基于挤压与激励框架的空间注意力机制,挤压与激励框架作用在空间注意力维度,聚焦于特征在空间维度的注意力信息,相应的实现方式为在通道维度对特征进行压缩,基于通道压缩后的特征激励其在空间维度的关联性,从而获 得特征在空间维度的注意力信息。In some optional implementations, the Squeeze-and-Excitation (SE) framework can be implemented by the SE Network structure (SE Network, SENet). In related technologies, SENet is mainly used to model the interdependence between channels of convolutional features to improve the representation effect of features in the channel dimension. In this process, although the amount of data processing is reduced by squeezing features in the spatial dimension, and channel attention information is obtained by stimulating the correlation between channels, the attention of features in the spatial dimension is also lost. information. In view of this, embodiments of the present disclosure provide a spatial attention mechanism based on a squeeze and excitation framework. The squeeze and excitation framework acts in the spatial attention dimension and focuses on the attention information of features in the spatial dimension. The corresponding implementation method In order to compress features in the channel dimension, the correlation in the spatial dimension is stimulated based on the channel compressed features to obtain Obtain the attention information of features in the spatial dimension.
在一些可选的实现方式中,上述空间注意力机制可用于处理图像数据、语音数据、文本数据、视频数据中的至少一种。即在步骤S11中,待处理数据可以包括图像数据、语音数据、文本数据、视频数据中的至少一种。In some optional implementations, the above spatial attention mechanism can be used to process at least one of image data, voice data, text data, and video data. That is, in step S11, the data to be processed may include at least one of image data, voice data, text data, and video data.
在一些可选的实现方式中,将待处理数据输入目标神经网络之后,目标神经网络基于挤压与激励框架的空间注意力机制进行数据处理,获得相应的处理结果。In some optional implementations, after the data to be processed is input into the target neural network, the target neural network processes the data based on the spatial attention mechanism of the squeeze and excitation framework to obtain corresponding processing results.
在一些可选的实现方式中,目标神经网络可用于执行图像处理任务、语音处理任务、文本处理任务、视频处理任务中的至少一种。与之相应的,处理结果包括图像处理结果、语音处理结果、文本处理结果、视频处理结果中的至少一种(其中,处理可以包括识别、分类、标注等操作),其与待处理数据的类型、待处理数据的内容、以及目标神经网络的执行任务等相关。本公开实施例对目标神经网络所能执行的任务、相应的待处理数据及处理结果均不作限制。In some optional implementations, the target neural network can be used to perform at least one of image processing tasks, speech processing tasks, text processing tasks, and video processing tasks. Correspondingly, the processing results include at least one of image processing results, speech processing results, text processing results, and video processing results (where the processing may include operations such as identification, classification, and labeling), which are related to the type of data to be processed. , the content of the data to be processed, and the execution tasks of the target neural network, etc. The embodiments of this disclosure place no restrictions on the tasks that the target neural network can perform, the corresponding data to be processed, and the processing results.
示例性地,目标神经网络在执行图像处理任务时,待处理数据包括至少一张待处理图像,将该待处理图像输入目标神经网络之后,通过其中的一些网络结构的处理(例如,卷积层),从待处理图像中提取出待处理图像特征。其中,待处理图像特征可以包括不同层级的图像特征,层级越高,图像特征的语义表征效果越好,层级越低,图像特征的空间表征效果越好。在获得待处理图像特征之后,目标神经网络从通道维度对其进行压缩,获得图像通道上下文特征,并进一步激励图像通道上下文特征在空间维度的关联性,获得图像空间注意力特征(可以表征空间维度的注意力信息),然后将该图像空间注意力特征与对应的待处理图像特征进行融合,即获得可以表征空间注意力信息的目标图像特征。上述目标图像特征可进一步应用于图像分类、图像标注、图像识别等图像处理过程中。For example, when the target neural network performs an image processing task, the data to be processed includes at least one image to be processed. After the image to be processed is input to the target neural network, it is processed through some network structures (for example, convolutional layers) ), extract the image features to be processed from the image to be processed. Among them, the image features to be processed can include image features at different levels. The higher the level, the better the semantic representation effect of the image features. The lower the level, the better the spatial representation effect of the image features. After obtaining the image features to be processed, the target neural network compresses them from the channel dimension to obtain the image channel context features, and further stimulates the correlation of the image channel context features in the spatial dimension to obtain the image spatial attention features (which can characterize the spatial dimensions attention information), and then fuse the image spatial attention features with the corresponding image features to be processed, that is, obtain the target image features that can represent the spatial attention information. The above target image features can be further used in image processing processes such as image classification, image annotation, and image recognition.
示例性地,目标神经网络在执行图像分类任务时,待处理数据包括至少一张待分类图像,将该待分类图像输入目标神经网络之后,通过其中的一些网络结构的处理(例如,卷积层),从待分类图像中提取出待分类图像特征。其中,待分类图像特征可以包括不同层级的图像特征,层级越高,图像特征的语义表征效果越好,层级越低,图像特征的空间表征效果越好。在获得待分类图像特征之后,目标神经网络从通道维度对其进行压缩,获得图像通道上下文特征,并进一步激励图像通道上下文特征在空间维度的关联性,获得图像空间注意力特征(可以表征空间维度的注意力信息),然后将该图像空间注意力特征与对应的待分类图像特征进行融合,即获得可以表征空间注意力信息的目标图像特征。基于上述目标图像特征,通过相应的图像分类算法,即可获得图像分类结果。For example, when the target neural network performs an image classification task, the data to be processed includes at least one image to be classified. After the image to be classified is input to the target neural network, it is processed through some network structures (for example, convolutional layers) ), extract the image features to be classified from the image to be classified. Among them, the image features to be classified can include image features at different levels. The higher the level, the better the semantic representation effect of the image features. The lower the level, the better the spatial representation effect of the image features. After obtaining the image features to be classified, the target neural network compresses them from the channel dimension to obtain the image channel context features, and further stimulates the correlation of the image channel context features in the spatial dimension to obtain the image spatial attention features (which can characterize the spatial dimensions attention information), and then fuse the image spatial attention features with the corresponding image features to be classified, that is, obtain the target image features that can represent the spatial attention information. Based on the above target image characteristics, the image classification results can be obtained through the corresponding image classification algorithm.
需要说明的是,待分类图像本质上属于外部技术数据,对其实施的一系列技术处理至少包括:将待分类图像输入目标神经网络,基于挤压与激励框架的空间注意力机制进行数据处理,获得图像分类结果。由此可知,在本公开实施例中,为了解决相应的技术问题,采用了一系列技术手段,这些技术手段中的技术操作,包括但不限于数据输入、特征提取、特征压缩以及特征转换等,上述技术操作均对应相应的技术特征,其都是对给定数据的计算机化处理和操作。通过这些技术操作,一方面可以减少数据处理量,降低对硬件设备的要求,另一方面还尽量保持了待分类图像的特征,以保障图像分类结果的准确性。由此可知,本公开实施例的技术方案虽然没有直接改变硬件设备的处理性能、运算速度、计算精度,但是改变了硬件设备在处理待分类图像等数据时所需的资源开销,从而提高了资源利用率。换言之,由于资源开销的减少,使得一些原本无法处理图像分类等任务的硬件设备,能够基于本公开实施例的数据处理方法执行相应的图像分类等数据处理,从而降低了对硬件设备的要求。It should be noted that the images to be classified are essentially external technical data, and a series of technical processing implemented on them at least include: inputting the images to be classified into the target neural network, data processing based on the spatial attention mechanism of the squeeze and excitation framework, Get image classification results. It can be seen from this that in the embodiments of the present disclosure, in order to solve the corresponding technical problems, a series of technical means are adopted. The technical operations in these technical means include but are not limited to data input, feature extraction, feature compression, feature conversion, etc., The above technical operations all correspond to corresponding technical features, and they are all computerized processing and operations of given data. Through these technical operations, on the one hand, the amount of data processing can be reduced and the requirements for hardware equipment can be reduced. On the other hand, the characteristics of the images to be classified can be maintained as much as possible to ensure the accuracy of the image classification results. It can be seen from this that although the technical solutions of the embodiments of the present disclosure do not directly change the processing performance, operation speed, and calculation accuracy of the hardware device, it changes the resource overhead required by the hardware device when processing data such as images to be classified, thereby improving resources. Utilization. In other words, due to the reduction in resource overhead, some hardware devices that were originally unable to handle tasks such as image classification can perform corresponding data processing such as image classification based on the data processing methods of embodiments of the present disclosure, thereby reducing the requirements for hardware devices.
其他任务处理过程与上述内容类似,在此不再展开描述。Other task processing processes are similar to the above and will not be described here.
需要说明的是,在一些可选的实现方式中,上述空间注意力机制在目标神经网络中对应某些网 络层或者网络结构,并由对应的网络层或网络结构实现上述处理过程。It should be noted that in some optional implementations, the above spatial attention mechanism corresponds to certain networks in the target neural network. network layer or network structure, and the corresponding network layer or network structure implements the above processing process.
示例性的,目标神经网络包括空间注意力模块,该空间注意力模块是根据基于挤压与激励框架的空间注意力机制构建的模块,用于从通道维度对特征进行压缩,激励经过通道压缩的特征在空间维度的关联性,获得特征在空间维度的注意力信息。应当理解,空间注意力模块本身还可以包括粒度更小的功能单元,各个功能单元之间具有一定的连接关系,基于各个功能单元及其连接关系,可以获得特征在空间维度的注意力信息。Exemplarily, the target neural network includes a spatial attention module. The spatial attention module is a module built according to the spatial attention mechanism based on the squeeze and excitation framework. It is used to compress features from the channel dimension and excite the features that have been compressed by the channel. The correlation of features in the spatial dimension and the attention information of the features in the spatial dimension are obtained. It should be understood that the spatial attention module itself can also include functional units with smaller granularity, and each functional unit has a certain connection relationship. Based on each functional unit and its connection relationship, the attention information of the feature in the spatial dimension can be obtained.
需要说明的是,基于挤压与激励框架的空间注意力机制仅是目标神经网络进行数据处理时所使用的处理机制之一,目标神经网络还可以基于其他数据处理机制进行数据处理,本公开实施例对此不作限制。换言之,基于挤压与激励框架的空间注意力机制对待处理数据进行处理,可以是整个数据处理过程中的一个处理步骤,也可以对应整个数据过程。对应到目标神经网络中,该目标神经网络可以只包括空间注意力模块,也可以包括除空间注意力模块之外的其他网络层或网络结构,本公开实施例对此不作限制。It should be noted that the spatial attention mechanism based on the squeeze and excitation framework is only one of the processing mechanisms used by the target neural network for data processing. The target neural network can also perform data processing based on other data processing mechanisms. This disclosure implements There is no restriction on this. In other words, the spatial attention mechanism based on the squeeze and excitation framework processes the data to be processed. It can be a processing step in the entire data processing process, or it can correspond to the entire data process. Corresponding to the target neural network, the target neural network may only include a spatial attention module, or may include other network layers or network structures in addition to the spatial attention module. This is not limited in the embodiments of the present disclosure.
根据本公开实施例,将待处理数据输入到目标神经网络,使得目标神经网络基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果,实现从通道维度对特征的压缩,从而可以降低特征在通道维度的尺寸,减少待处理数据量,从而提升任务处理效率,同时,通过激励经过通道压缩的特征在空间维度的关联性,能够获得特征在空间维度的注意力信息,从而提高处理结果的准确性,进而提升任务处理的准确率。According to the embodiment of the present disclosure, the data to be processed is input to the target neural network, so that the target neural network performs data processing based on the spatial attention mechanism of the squeeze and excitation framework, obtains the processing results, and realizes compression of features from the channel dimension, so that it can Reduce the size of the feature in the channel dimension and reduce the amount of data to be processed, thereby improving task processing efficiency. At the same time, by stimulating the correlation of the channel-compressed features in the spatial dimension, the attention information of the feature in the spatial dimension can be obtained, thereby improving processing The accuracy of the results, thereby improving the accuracy of task processing.
图2为本公开实施例提供的一种数据处理方法的流程图。参照图2,该方法包括如下步骤。Figure 2 is a flow chart of a data processing method provided by an embodiment of the present disclosure. Referring to Figure 2, the method includes the following steps.
在步骤S21中,根据待处理数据确定待处理特征。In step S21, characteristics to be processed are determined based on the data to be processed.
在步骤S22中,从通道维度对待处理特征进行特征压缩,获得通道上下文特征。In step S22, feature compression is performed on the features to be processed from the channel dimension to obtain channel context features.
在步骤S23中,对通道上下文特征进行特征转换,获得空间注意力特征。In step S23, feature conversion is performed on the channel context features to obtain spatial attention features.
在步骤S24中,对待处理特征和空间注意力特征进行特征融合,获得目标特征。In step S24, feature fusion is performed on the features to be processed and the spatial attention features to obtain the target features.
在步骤S25中,根据目标特征,确定处理结果。In step S25, the processing result is determined based on the target characteristics.
其中,待处理特征、通道上下文特征、空间注意力特征以及目标特征在空间维度的特征尺寸相同,待处理特征与目标特征在通道维度具有相同的第一特征尺寸,通道上下文特征与空间注意力特征在通道维度具有相同的第二特征尺寸,且第一特征尺寸大于第二特征尺寸。换言之,在上述数据处理过程中,可以先对待处理特征在通道维度进行压缩,然后基于压缩后的特征进行转换处理,获得空间维度的注意力信息,最后将空间维度的注意力信息融合到待处理特征中,从而获得目标特征。Among them, the feature size to be processed, the channel context feature, the spatial attention feature and the target feature are the same in the spatial dimension. The feature to be processed and the target feature have the same first feature size in the channel dimension. The channel context feature and the spatial attention feature are the same. have the same second feature size in the channel dimension, and the first feature size is larger than the second feature size. In other words, during the above data processing process, the features to be processed can be compressed in the channel dimension first, and then transformed based on the compressed features to obtain the attention information of the spatial dimension. Finally, the attention information of the spatial dimension can be integrated into the data to be processed. features to obtain the target features.
在一些可选的实现方式中,待处理特征可以是基于待处理数据获得的特征,其与待处理数据之间可以具有一定的对应关系。In some optional implementations, the features to be processed may be features obtained based on the data to be processed, and may have a certain corresponding relationship with the data to be processed.
在一些可选的实现方式中,在步骤S21中,根据待处理数据确定待处理特征,包括:直接对待处理数据进行特征提取,获得待处理特征;或者,先对待处理数据进行一些数据处理操作,获得中间数据,再对中间数据进行特征提取,获得待处理特征;或者,先对待处理数据进行特征提取,获得初始特征,再对初始特征进行一些数据处理操作,获得待处理特征。In some optional implementations, in step S21, determining the features to be processed based on the data to be processed includes: directly performing feature extraction on the data to be processed to obtain the features to be processed; or, first performing some data processing operations on the data to be processed, Obtain the intermediate data, and then perform feature extraction on the intermediate data to obtain the features to be processed; or, first perform feature extraction on the data to be processed to obtain the initial features, and then perform some data processing operations on the initial features to obtain the features to be processed.
需要说明的是,以上对于待处理特征的确定方式仅是举例说明,本公开实施例不限制根据待处理数据确定待处理特征的方式。It should be noted that the above methods for determining the features to be processed are only examples, and the embodiments of the present disclosure do not limit the method of determining the features to be processed based on the data to be processed.
在获得待处理特征之后,可以利用基于挤压与激励框架的空间注意力机制,对待处理特征在通道维度进行压缩,并激励经过通道压缩的特征在空间维度的关联性,获得待处理特征在空间维度的注意力信息,将空间维度的注意力信息与待处理特征进行特征融合,从而获得表征效果更佳的目标特征,以便基于目标特征确定相应的处理结果。上述处理过程对应本公开实施例的步骤S22至步骤S25。 After obtaining the features to be processed, the spatial attention mechanism based on the extrusion and excitation framework can be used to compress the features to be processed in the channel dimension, and stimulate the correlation of the channel-compressed features in the spatial dimension to obtain the spatial dimension of the features to be processed. Dimensional attention information, feature fusion of spatial dimension attention information and features to be processed, to obtain target features with better representation, so as to determine the corresponding processing results based on the target features. The above processing process corresponds to step S22 to step S25 in the embodiment of the present disclosure.
在一些可选的实现方式中,步骤S22主要用于对待处理特征在通道维度进行压缩,以降低数据处理量。其中,通道上下文特征用于表征待处理特征在通道维度的上下文关联关系。In some optional implementations, step S22 is mainly used to compress the features to be processed in the channel dimension to reduce the amount of data processing. Among them, the channel context feature is used to characterize the contextual relationship of the features to be processed in the channel dimension.
在一些可选的实现方式中,在步骤S22中,从通道维度对待处理特征进行特征压缩,获得通道上下文特征,包括:对待处理特征在通道维度进行池化(Pooling)处理,获得通道上下文特征。In some optional implementations, in step S22, performing feature compression on the features to be processed from the channel dimension to obtain the channel context features includes: pooling the features to be processed in the channel dimension to obtain the channel context features.
其中,池化是神经网络中的一个重要的概念,其本质是一种降采样处理方式。通过池化处理,可以增大特征的感受野,降低参数量,同时还能保持特征的某些不变性(例如,旋转不变性、平移不变性、伸缩不变性等)。常见的池化处理方式包括平均池化(Average Pooling)、最大池化(Max Pooling)和全局池化(Global Pooling)等。其中,平均池化是指对邻域内的特征点求平均;最大池化是指对邻域内的特征点取最大值;全局池化是指将整个特征作一个窗口进行池化处理,其可用于降低特征维度,而且,全局池化通常与平均池化、最大池化结合使用(例如,全局平均池化、全局最大池化等)。在实际应用中,可以通过各种池化函数实现对特征的池化处理。Among them, pooling is an important concept in neural networks, and its essence is a downsampling processing method. Through pooling processing, the receptive field of the feature can be increased and the number of parameters can be reduced, while maintaining certain invariances of the feature (for example, rotation invariance, translation invariance, scaling invariance, etc.). Common pooling processing methods include average pooling (Average Pooling), maximum pooling (Max Pooling) and global pooling (Global Pooling). Among them, average pooling refers to averaging the feature points in the neighborhood; maximum pooling refers to taking the maximum value of the feature points in the neighborhood; global pooling refers to making the entire feature a window for pooling processing, which can be used Reduce the feature dimension, and global pooling is usually used in combination with average pooling and max pooling (for example, global average pooling, global max pooling, etc.). In practical applications, features can be pooled through various pooling functions.
示例性地,对待处理特征在通道维度进行池化处理,获得通道上下文特征,包括:对待处理特征在通道维度进行平均池化处理,获得在通道维度特征尺度为n1的通道上下文特征。其中,1<n1<N,N为待处理特征的通道数量,N≥1。For example, performing pooling processing on the features to be processed in the channel dimension to obtain channel context features includes: performing average pooling processing on the features to be processed in the channel dimension to obtain channel context features with a feature scale of n1 in the channel dimension. Among them, 1<n1<N, N is the number of channels of features to be processed, and N≥1.
示例性地,对待处理特征在通道维度进行池化处理,获得通道上下文特征,包括:对待处理特征在通道维度进行最大池化处理,获得在通道维度特征尺度为n2的通道上下文特征。其中,1<n2<N,N为待处理特征的通道数量。For example, performing pooling processing on the features to be processed in the channel dimension to obtain channel context features includes: performing maximum pooling processing on the features to be processed in the channel dimension to obtain channel context features with a feature scale of n2 in the channel dimension. Among them, 1<n2<N, N is the number of channels of features to be processed.
需要说明的是,由于n1及n2小于N,因此,经过上述平均池化处理或者最大池化处理之后,待处理特征在通道维度的特征尺度得以降低(从N降低到n1或者从N降低到n2),相应的数据量也得以降低,从而可以降低任务处理压力。It should be noted that since n1 and n2 are smaller than N, after the above average pooling process or maximum pooling process, the feature scale of the feature to be processed in the channel dimension can be reduced (from N to n1 or from N to n2 ), the corresponding data volume is also reduced, thereby reducing task processing pressure.
进一步地,考虑到n1及n2均为大于1的整数,因此,待处理特征在通道维度的特征尺度还存在降低的空间,基于此,在一些可选的实现方式中,将待处理特征在通道维度进行最大程度的特征压缩,以最大化地降低数据处理量。Furthermore, considering that n1 and n2 are both integers greater than 1, there is still room for reduction in the feature scale of the features to be processed in the channel dimension. Based on this, in some optional implementations, the features to be processed are added in the channel dimension. Dimensions are compressed to the greatest extent to minimize the amount of data processing.
示例性地,对待处理特征在通道维度进行池化处理,获得通道上下文特征,包括:对待处理特征在通道维度进行全局平均池化处理,获得在通道维度特征尺度为1(即通道数n=1)的通道上下文特征。Exemplarily, performing pooling processing on the features to be processed in the channel dimension to obtain the channel context features includes: performing global average pooling processing on the features to be processed in the channel dimension to obtain a feature scale of 1 in the channel dimension (that is, the number of channels n=1 ) channel context features.
示例性地,对待处理特征在通道维度进行池化处理,获得通道上下文特征,包括:对待处理特征在通道维度进行全局最大池化处理,获得在通道维度特征尺度为1(即通道数n=1)的通道上下文特征。Exemplarily, performing pooling processing on the features to be processed in the channel dimension to obtain the channel context features includes: performing global maximum pooling processing on the features to be processed in the channel dimension, and obtaining a feature scale of 1 in the channel dimension (that is, the number of channels n=1 ) channel context features.
应当理解,由于待处理特征在通道维度的特征尺度被压缩为1,不存在继续压缩的空间,因此,对应的数据处理量为最低处理量。It should be understood that since the feature scale of the feature to be processed in the channel dimension is compressed to 1, there is no room for continued compression. Therefore, the corresponding data processing amount is the minimum processing amount.
需要说明的是,在基于池化处理方式获取通道上下文特征时,对应的参数通常为超参数,经过池化处理获得的通道上下文特征所能保留的通道上下文信息有限。考虑到卷积处理也可以实现特征尺寸的压缩,并且在进行卷积处理时,可以根据需要引入各种可学习参数(例如,卷积核的权重),从而使经过卷积处理的特征在被压缩的同时,能够更好的保留特征信息。基于此,在一些可选的实现方式中,通过卷积处理的方式获取通道上下文特征,以使通道上下文特征能够更好地保留通道上下文信息。It should be noted that when obtaining channel context features based on pooling processing, the corresponding parameters are usually hyperparameters, and the channel context features obtained through pooling processing can retain limited channel context information. Considering that convolution processing can also achieve feature size compression, and when performing convolution processing, various learnable parameters (for example, the weight of the convolution kernel) can be introduced as needed, so that the convolution-processed features can be While compressing, feature information can be better retained. Based on this, in some optional implementations, channel context features are obtained through convolution processing so that the channel context features can better retain channel context information.
在一些可选的实现方式中,在步骤S22中,从通道维度对待处理特征进行特征压缩,获得通道上下文特征,包括:对待处理特征在通道维度进行卷积处理,获得通道上下文特征。In some optional implementations, in step S22, performing feature compression on the feature to be processed from the channel dimension to obtain the channel context feature includes: performing convolution processing on the feature to be processed in the channel dimension to obtain the channel context feature.
需要说明的是,与池化处理方式类似,通过修改卷积处理的参数,可以调整通道上下文特征在通道维度的特征尺度。 It should be noted that, similar to the pooling processing method, by modifying the parameters of the convolution processing, the feature scale of the channel context feature in the channel dimension can be adjusted.
示例性地,对待处理特征在通道维度进行卷积处理,获得通道上下文特征,包括:对待处理特征在通道维度进行卷积,获得在通道维度特征尺度为n3的通道上下文特征。其中,1<n3<N,N为待处理特征的通道数量。For example, performing convolution processing on the feature to be processed in the channel dimension to obtain the channel context feature includes: convolving the feature to be processed in the channel dimension to obtain the channel context feature with a feature scale of n3 in the channel dimension. Among them, 1<n3<N, N is the number of channels of features to be processed.
示例性地,对待处理特征在通道维度进行卷积处理,获得通道上下文特征,包括:对待处理特征在通道维度进行全局卷积,获得在通道维度特征尺度为1(即通道数n=1)的通道上下文特征。Exemplarily, performing convolution processing on the feature to be processed in the channel dimension to obtain the channel context feature includes: performing global convolution on the feature to be processed in the channel dimension to obtain the feature scale of 1 in the channel dimension (that is, the number of channels n=1) Channel context features.
如前所述,在获得通道上下文特征之后,通过对通道上下文特征进行特征转换,激励特征在空间维度的注意力信息,可以获得能表征空间注意力信息的空间注意力特征。As mentioned before, after obtaining the channel context features, by performing feature transformation on the channel context features and stimulating the attention information of the features in the spatial dimension, we can obtain spatial attention features that can represent the spatial attention information.
在一些可选的实现方式中,在步骤S23中,对通道上下文特征进行特征转换,获得空间注意力特征,包括:对通道上下文特征在空间维度进行特征提取,获得第一中间特征;对第一中间特征进行激活处理,获得第二中间特征;对第二中间特征进行特征还原处理,获得第三中间特征;对第三中间特征进行激活处理,获得空间注意力特征。In some optional implementations, in step S23, performing feature conversion on the channel context features to obtain spatial attention features includes: performing feature extraction on the channel context features in the spatial dimension to obtain the first intermediate features; The intermediate features are activated to obtain the second intermediate features; the second intermediate features are subjected to feature reduction processing to obtain the third intermediate features; the third intermediate features are activated to obtain the spatial attention features.
换言之,通过对通道上下文特征进行特征转换,即可获得能够表征空间维度的注意力信息的空间注意力特征,由此明确了处理对象(即通道上下文特征)、处理维度(即空间维度)以及要获取的信息(即注意力信息),因此可以通过任意一种方式获得通道上下文特征在空间维度的注意力信息,本公开实施例对此不作限制。In other words, by performing feature transformation on the channel context features, spatial attention features that can represent the attention information of the spatial dimension can be obtained, thus clarifying the processing objects (i.e., channel context features), processing dimensions (i.e., spatial dimensions), and requirements. The acquired information (i.e., attention information), therefore, the attention information of the channel context features in the spatial dimension can be obtained in any way, and the embodiment of the present disclosure does not limit this.
在一些可选的实现方式中,对通道上下文特征在空间维度进行特征提取,获得第一中间特征,包括:对通道上下文特征在空间维度进行第一卷积处理,获得第一中间特征。In some optional implementations, performing feature extraction on the channel context features in the spatial dimension to obtain the first intermediate features includes: performing a first convolution process on the channel context features in the spatial dimension to obtain the first intermediate features.
示例性地,第一卷积对应一个卷积核,且卷积核的尺寸为3*3,步长为2。For example, the first convolution corresponds to a convolution kernel, and the size of the convolution kernel is 3*3 and the step size is 2.
换言之,第一卷积处理并不改变通道上下文特征在通道维度的特征尺度,其主要是在空间维度对通道上下文特征进行挤压处理,通过这种挤压处理可以缩小数据处理量。但是,考虑到处理准确度的要求,在一些可选的实现方式中,可以适当增加数据处理量,以换取更高的处理准确度。相应的处理方式包括采用多个卷积核以获得多个输出通道的特征,再在通道维度取平均值(Channel-mean)的方式提高处理准确度。In other words, the first convolution process does not change the feature scale of the channel context features in the channel dimension. It mainly squeezes the channel context features in the spatial dimension. Through this extrusion process, the amount of data processing can be reduced. However, considering the requirements for processing accuracy, in some optional implementations, the amount of data processing can be appropriately increased in exchange for higher processing accuracy. Corresponding processing methods include using multiple convolution kernels to obtain the characteristics of multiple output channels, and then averaging in the channel dimension (Channel-mean) to improve processing accuracy.
在一些可选的实现方式中,对通道上下文特征在空间维度进行特征提取,获得第一中间特征,包括:对通道上下文特征在空间维度进行第二卷积处理,获得多个通道对应的第四中间特征;确定多个第四中间特征在通道维度的平均值,获得第一中间特征。In some optional implementations, performing feature extraction on the channel context features in the spatial dimension to obtain the first intermediate features includes: performing a second convolution process on the channel context features in the spatial dimension to obtain the fourth corresponding to multiple channels. Intermediate features; determine the average value of multiple fourth intermediate features in the channel dimension to obtain the first intermediate feature.
其中,第二卷积处理在对通道上下文特征在空间维度进行挤压的同时,在通道维度进行了适当的扩展(即扩展了通道数量),最后通过通道平均的方式获得第一中间特征。Among them, the second convolution process not only squeezes the channel context features in the spatial dimension, but also appropriately expands the channel dimension (that is, expands the number of channels), and finally obtains the first intermediate feature through channel averaging.
示例性地,第二卷积对应四个卷积核,每个卷积核对应一个通道,且卷积核的尺寸为7*7,步长为4,膨胀系数为2。换言之,通过第二卷积处理,将通道上下文特征扩展到四个通道中来获取空间注意力信息,再基于通道平均方式,确定四个通道的空间注意力信息的平均值,从而获得第一中间特征。For example, the second convolution corresponds to four convolution kernels, each convolution kernel corresponds to one channel, and the size of the convolution kernel is 7*7, the step size is 4, and the expansion coefficient is 2. In other words, through the second convolution process, the channel context features are extended to four channels to obtain spatial attention information, and then based on the channel averaging method, the average value of the spatial attention information of the four channels is determined, thereby obtaining the first intermediate feature.
在一些可选的实现方式中,在获得第一中间特征之后,可以对第一中间特征进行非线性激活处理,以增加网络的非线性特征,获得更好的处理结果。In some optional implementations, after obtaining the first intermediate feature, nonlinear activation processing can be performed on the first intermediate feature to increase the nonlinear characteristics of the network and obtain better processing results.
在一些可选的实现方式中,对第一中间特征进行激活处理,获得第二中间特征,包括:基于线性整流(Rectified Linear Unit,ReLU)函数对第一中间特征进行非线性激活,获得第二中间特征。In some optional implementations, activation processing is performed on the first intermediate feature to obtain the second intermediate feature, including: performing nonlinear activation on the first intermediate feature based on a linear rectification (Rectified Linear Unit, ReLU) function to obtain the second intermediate feature. intermediate characteristics.
需要说明的是,以上对于非线性激活函数仅是举例说明,双曲正切(Tanh)函数、指数线性单元(Exponential Linear Unit,ELU)函数以及高斯误差线性单元(GeLu)等均可用于对第一中间特征进行激活处理,本公开实施例对此不作限制。It should be noted that the above is just an example of the nonlinear activation function. The hyperbolic tangent (Tanh) function, the exponential linear unit (ELU) function and the Gaussian error linear unit (GeLu) can all be used for the first The intermediate features are activated, and the embodiment of the present disclosure does not limit this.
如前所述,对通道上下文特征进行卷积处理,获得第一中间特征,并经过激活处理,获得第二中间特征。通常情况下,针对通道上下文特征采用卷积方式提取特征之后,输出的特征尺寸(即第 一中间特征)通常会变小,某些情况下,需要将缩小的特征恢复到原来的尺寸(即通道上下文的特征尺寸)以便进行进一步的计算,这种采用扩大特征尺寸,实现特征由小分辨率到大分辨率的映射的操作,叫做上采样(Upsample)。反卷积(Transposed Convolution)处理是上采样的实现方式之一,其本质属于一种特殊的正向卷积,即先按照一定的比例通过补0来扩大原有特征的尺寸,然后旋转卷积核,再进行正向卷积。在本公开实施例中,可以对第二中间特征进行第一反卷积处理,以便获得与通道上下文特征尺寸相同的空间注意力特征。As mentioned before, the channel context features are convolved to obtain the first intermediate features, and after activation processing, the second intermediate features are obtained. Normally, after extracting features using convolution for channel context features, the output feature size (i.e., the (an intermediate feature) usually becomes smaller. In some cases, the reduced feature needs to be restored to its original size (i.e., the feature size of the channel context) for further calculations. This method uses enlarging the feature size to achieve feature resolution from small to small. The operation of mapping from high resolution to large resolution is called upsampling. Deconvolution (Transposed Convolution) processing is one of the implementation methods of upsampling. Its essence is a special forward convolution, that is, it first expands the size of the original feature by padding 0 according to a certain proportion, and then rotates the convolution. kernel, and then perform forward convolution. In embodiments of the present disclosure, a first deconvolution process may be performed on the second intermediate features to obtain spatial attention features with the same size as the channel context features.
在一些可选的实现方式中,对第二中间特征进行特征还原处理,获得第三中间特征,包括:对第二中间特征进行第一反卷积处理,获得第三中间特征。In some optional implementations, performing feature reduction processing on the second intermediate feature to obtain the third intermediate feature includes: performing a first deconvolution process on the second intermediate feature to obtain the third intermediate feature.
示例性地,第一反卷积对应一个卷积核,且卷积核的尺寸为3*3,步长为2。For example, the first deconvolution corresponds to a convolution kernel, and the size of the convolution kernel is 3*3 and the step size is 2.
与根据通道上下文特征获取第一中间特征类似,出于提高处理准确度的考虑,可以适当增加数据处理量,相应的处理方式为采用多个卷积核以获得多个输出通道、再经过通道维度取平均值。Similar to obtaining the first intermediate feature based on channel context features, in order to improve processing accuracy, the amount of data processing can be appropriately increased. The corresponding processing method is to use multiple convolution kernels to obtain multiple output channels, and then pass the channel dimension take the average.
在一些可选的实现方式中,对第二中间特征进行特征还原处理,获得第三中间特征,包括:对第二中间特征进行第二反卷积处理,获得多个通道对应的第五中间特征;基于多个第五中间特征在通道维度的平均值,获得第三中间特征。In some optional implementations, performing feature reduction processing on the second intermediate feature to obtain a third intermediate feature includes: performing a second deconvolution process on the second intermediate feature to obtain a fifth intermediate feature corresponding to multiple channels. ; Based on the average value of multiple fifth intermediate features in the channel dimension, the third intermediate feature is obtained.
示例性地,第二反卷积对应的四个卷积核,各个卷积核对应一个通道,且卷积核的尺寸为7*7,步长为4,膨胀系数为2。换言之,通过第二反卷积处理,将第二中间特征扩展到四个通道,每个通道对应一个第五中间特征,然后再基于通道平均方式,确定四个第五中间特征的平均值,从而获得第三中间特征。For example, there are four convolution kernels corresponding to the second deconvolution, each convolution kernel corresponds to one channel, and the size of the convolution kernel is 7*7, the step size is 4, and the expansion coefficient is 2. In other words, through the second deconvolution process, the second intermediate feature is expanded to four channels, each channel corresponds to a fifth intermediate feature, and then based on the channel averaging method, the average of the four fifth intermediate features is determined, so that Obtain third intermediate features.
需要说明的是,以上对于各项卷积处理和反卷积处理中所使用的参数仅是举例说明,本公开实施例对此不作限制。It should be noted that the above parameters used in various convolution processes and deconvolution processes are only examples, and the embodiments of the present disclosure do not limit them.
与第一中间特征类似,在获得第三中间特征之后,可以对第三中间特征进行非线性激活处理,并且,为便于后续计算,还需进行归一化处理。其中,非线性激活处理和归一化处理可以由一个同时具有非线性激活及归一化功能的函数实现,也可以分别由非线性激活函数和归一化函数实现,本公开实施例对此不作限制。Similar to the first intermediate feature, after the third intermediate feature is obtained, nonlinear activation processing can be performed on the third intermediate feature, and, in order to facilitate subsequent calculations, normalization processing is also required. Among them, the nonlinear activation process and the normalization process can be implemented by a function that has both nonlinear activation and normalization functions, or they can be implemented by a nonlinear activation function and a normalization function respectively. This is not the case in the embodiment of the present disclosure. limit.
在一些可选的实现方式中,对第三中间特征进行激活处理,获得空间注意力特征,包括:基于S型(Sigmoid)函数对第三中间特征进行非线性归一化激活,获得空间注意力特征。由于Sigmoid函数既具备非线性激活的功能,也具备归一化的功能,因此,直接基于Sigmoid函数即可获得空间注意力特征。In some optional implementations, activating the third intermediate feature to obtain spatial attention features includes: performing nonlinear normalized activation of the third intermediate feature based on the Sigmoid function to obtain spatial attention. feature. Since the Sigmoid function has both nonlinear activation and normalization functions, spatial attention features can be obtained directly based on the Sigmoid function.
在一些可选的实现方式中,对第三中间特征进行激活处理,获得空间注意力特征,包括:基于ReLU函数对第三中间特征进行非线性激活,并基于归一化指数(Softmax)函数对非线性激活结果进行归一化处理,获得空间注意力特征。In some optional implementations, activation processing is performed on the third intermediate feature to obtain spatial attention features, including: nonlinear activation of the third intermediate feature based on the ReLU function, and activation of the third intermediate feature based on the normalized index (Softmax) function. The nonlinear activation results are normalized to obtain spatial attention features.
在获得空间注意力特征之后,通过特征融合,即能获得可以良好表征空间注意力信息的目标特征。After obtaining the spatial attention features, through feature fusion, the target features that can well represent the spatial attention information can be obtained.
在一些可选的实现方式中,在步骤S24中,对待处理特征和空间注意力特征进行特征融合,获得目标特征,包括:将待处理特征与空间注意力特征逐点相加,获得目标特征。In some optional implementations, in step S24, performing feature fusion on the features to be processed and the spatial attention features to obtain the target features includes: adding the features to be processed and the spatial attention features point by point to obtain the target features.
在一些可选的实现方式中,在步骤S24中,对待处理特征和空间注意力特征进行特征融合,获得目标特征,包括:将待处理特征与空间注意力特征逐点相乘,获得目标特征。In some optional implementations, in step S24, performing feature fusion on the features to be processed and the spatial attention features to obtain the target features includes: multiplying the features to be processed and the spatial attention features point by point to obtain the target features.
在获得目标特征之后,在步骤S25中,根据该目标特征,即可确定处理结果。After the target characteristics are obtained, in step S25, the processing result can be determined based on the target characteristics.
在一些可选的实现方式中,根据目标特征,确定处理结果,包括:直接根据目标特征,确定处理结果;或者,对目标特征进行一些数据处理操作,从而获得处理结果。In some optional implementations, determining the processing result based on the target characteristics includes: determining the processing result directly based on the target characteristics; or performing some data processing operations on the target characteristics to obtain the processing results.
如前所述,本公开实施例通过上述步骤实现相应的数据处理方法。在一些可选的实现方式中, 上述方法对应目标神经网络中的一些网络层或网络结构,由这些网络层或网络结构实现上述数据处理方法。As mentioned above, the embodiment of the present disclosure implements the corresponding data processing method through the above steps. In some alternative implementations, The above method corresponds to some network layers or network structures in the target neural network, and these network layers or network structures implement the above data processing method.
下面结合图3-8对本公开实施例的数据处理方法进行展开说明。The data processing method of the embodiment of the present disclosure will be described below with reference to Figures 3-8.
图3为本公开实施例提供的一种目标神经网络的示意图。参照图3,该目标神经网络包括:第一网络结构、空间注意力模块和第二网络结构,其中,空间注意力模块包括上下文建模(Context Modelling)单元、转换单元和融合单元。Figure 3 is a schematic diagram of a target neural network provided by an embodiment of the present disclosure. Referring to Figure 3, the target neural network includes: a first network structure, a spatial attention module and a second network structure, where the spatial attention module includes a context modeling unit, a conversion unit and a fusion unit.
在一些可选的实现方式中,将待处理数据输入目标神经网络中,由位于空间注意力模块之前的第一网络结构对待处理数据的处理,获得待处理特征,并将待处理特征作为空间注意力模块的输入数据。空间注意力模块通过上下文建模单元从通道维度对待处理特征进行特征压缩,获得通道上下文特征,并将通道上下文特征输入转换单元;转换单元对该通道上下文特征进行特征转换,获得空间注意力特征,并将空间注意力特征输入融合单元;融合单元将待处理特征与空间注意力特征通过逐点相乘或者逐点相加等方式进行特征融合,获得目标特征,并将目标特征输入第二网络结构。第二网络结构基于目标特征进行相应的数据处理,获得处理结果。In some optional implementations, the data to be processed is input into the target neural network, and the first network structure located before the spatial attention module processes the data to be processed, obtains the features to be processed, and uses the features to be processed as the spatial attention module. Input data for the force module. The spatial attention module uses the context modeling unit to compress the features to be processed from the channel dimension to obtain the channel context features, and input the channel context features into the conversion unit; the conversion unit performs feature conversion on the channel context features to obtain the spatial attention features. And input the spatial attention features into the fusion unit; the fusion unit fuses the features to be processed and the spatial attention features through point-by-point multiplication or point-by-point addition to obtain the target features, and inputs the target features into the second network structure. . The second network structure performs corresponding data processing based on the target characteristics to obtain processing results.
将上述目标神经网络中的各个网络结构和模块对应到本公开实施例中可知,第一网络结构用于执行步骤S21的处理过程,上下文建模单元用于执行步骤S22的处理过程,转换单元用于执行步骤S23的处理过程,融合单元用于执行步骤S24的处理过程,第二网络结构用于执行步骤S25的处理过程。Corresponding each network structure and module in the above target neural network to the embodiment of the present disclosure, it can be seen that the first network structure is used to perform the processing process of step S21, the context modeling unit is used to perform the processing process of step S22, and the conversion unit is used to perform the processing process of step S21. In performing the processing of step S23, the fusion unit is used to perform the processing of step S24, and the second network structure is used to perform the processing of step S25.
需要说明的是,第一网络结构和第二网络结构是抽象出来的网络结构,两者的内部结构可以相同,也可以不同,本公开实施例对此不作限制。进一步地,在一些可选的实现方式中,可以根据任务处理需求、统计数据、经验等信息设置第一网络结构和第二网络结构。示例性地,第一网络结构可以包括卷积层、池化层、连接层、激活层等网络层中的任意一种或多种,第二网络结构也可以包括卷积层、池化层、连接层、激活层等网络层中的任意一种或多种。It should be noted that the first network structure and the second network structure are abstract network structures, and their internal structures may be the same or different, and the embodiments of the present disclosure do not limit this. Further, in some optional implementations, the first network structure and the second network structure can be set according to task processing requirements, statistical data, experience and other information. For example, the first network structure may include any one or more of convolutional layers, pooling layers, connection layers, activation layers, etc., and the second network structure may also include convolutional layers, pooling layers, Any one or more of the network layers such as connection layer and activation layer.
图3仅从功能层面较为简单地示出了目标神经网络的框架结构,在一些可选的实现方式中,上述各个网络结构或模块还可选由更细粒度的功能单元组成。在本公开实施例中,由于较为关心目标神经网络中空间注意力模块的结构,而不限制除此之外的网络结构,因此,在图4中仅示出该目标神经网络中空间注意力模块的各项功能单元。Figure 3 only shows the framework structure of the target neural network relatively simply from the functional level. In some optional implementations, each of the above network structures or modules can optionally be composed of more fine-grained functional units. In this embodiment of the present disclosure, since we are more concerned about the structure of the spatial attention module in the target neural network and do not limit other network structures, only the spatial attention module in the target neural network is shown in Figure 4 various functional units.
图4为本公开实施例提供的一种空间注意力模块的示意图。参照图4,该空间注意力模块包括:上下文建模单元、转换单元和融合单元,其中,转换单元包括特征提取层、第一激活层、特征还原层和第二激活层。Figure 4 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure. Referring to Figure 4, the spatial attention module includes: a context modeling unit, a conversion unit and a fusion unit, where the conversion unit includes a feature extraction layer, a first activation layer, a feature reduction layer and a second activation layer.
在一些可选的实现方式中,将获得的待处理特征输入空间注意力模块之后,首先由上下文建模单元对其从通道维度进行压缩,获得通道上下文特征,并将通道上下文特征输入特征提取层;特征提取层对该通道上下文特征在空间维度进行特征提取,获得第一中间特征,并将第一中间特征输入第一激活层;第一激活层对第一中间特征进行激活处理,获得第二中间特征,并将第二中间特征输入特征还原层;特征还原层对第二中间特征进行特征还原处理,获得第三中间特征,并将第三中间特征输入第二激活层;第二激活层对第三中间特征进行激活处理,获得空间注意力特征,并将空间注意力特征输入融合单元;融合单元将待处理特征与空间注意力特征通过逐点相乘或者逐点相加等方式进行特征融合,获得目标特征,并向外输出该目标特征,以便目标神经网络的其他网络结构基于目标特征进行数据处理,获得相应的处理结果。In some optional implementations, after the obtained features to be processed are input into the spatial attention module, the context modeling unit first compresses them from the channel dimension to obtain the channel context features, and inputs the channel context features into the feature extraction layer. ; The feature extraction layer performs feature extraction on the channel context feature in the spatial dimension, obtains the first intermediate feature, and inputs the first intermediate feature into the first activation layer; the first activation layer activates the first intermediate feature, and obtains the second intermediate features, and input the second intermediate features into the feature reduction layer; the feature reduction layer performs feature reduction processing on the second intermediate features, obtains the third intermediate features, and inputs the third intermediate features into the second activation layer; the second activation layer The third intermediate feature is activated to obtain spatial attention features, and the spatial attention features are input into the fusion unit; the fusion unit performs feature fusion on the features to be processed and the spatial attention features by point-by-point multiplication or point-by-point addition. , obtain the target features, and output the target features so that other network structures of the target neural network can perform data processing based on the target features and obtain corresponding processing results.
在图4中,特征压缩可以通过池化处理实现,特征提取可以通过卷积处理实现,特征还原可以通过反卷积处理实现,特征激活可以由相应的激活函数实现。将图4中的上述处理过程如使用对应的网络层来替代,可以获得图5所示的空间注意力模块。 In Figure 4, feature compression can be achieved by pooling processing, feature extraction can be achieved by convolution processing, feature restoration can be achieved by deconvolution processing, and feature activation can be achieved by the corresponding activation function. By replacing the above processing process in Figure 4 with the corresponding network layer, the spatial attention module shown in Figure 5 can be obtained.
图5为本公开实施例提供的一种空间注意力模块的示意图。参照图5,该空间注意力模块主要包括:全局平均池化(GAP)层、第一卷积层、ReLU激活层、第一反卷积层、Sigmoid激活层。Figure 5 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure. Referring to Figure 5, the spatial attention module mainly includes: a global average pooling (GAP) layer, a first convolution layer, a ReLU activation layer, a first deconvolution layer, and a Sigmoid activation layer.
在一些可选的实现方式中,待处理特征为一个四维张量(b,c,h1,w1),其中,b表示待处理特征的数量,c表示待处理特征的通道数量,h1和w1分别表示待处理特征的高度和宽度。In some optional implementations, the feature to be processed is a four-dimensional tensor (b, c, h1, w1), where b represents the number of features to be processed, c represents the number of channels of the feature to be processed, h1 and w1 respectively Represents the height and width of the feature to be processed.
将待处理特征输入空间注意力模块之后,全局平均池化层对待处理特征进行全局平均池化处理,获得通道上下文特征。由于是全局池化处理,因此,该通道上下文特征为全局的通道上下文特征,而且,该通道上下文特征对应的张量尺寸为(b,1,h1,w1),换言之,全局平均池化层将待处理特征的通道数量由c压缩为1,但是不改变其在空间维度的特征尺寸。After inputting the features to be processed into the spatial attention module, the global average pooling layer performs global average pooling on the features to be processed to obtain channel context features. Since it is a global pooling process, the channel context feature is a global channel context feature, and the tensor size corresponding to the channel context feature is (b, 1, h1, w1). In other words, the global average pooling layer will The number of channels of the feature to be processed is compressed from c to 1, but its feature size in the spatial dimension is not changed.
获得通道上下文特征之后,由第一卷积层对通道上下文特征进行卷积处理,提取出第一中间特征,第一中间特征对应张量尺寸为(b,1,h2,w2),且h2<h1,w2<w1。其中,第一卷积层可以使用一个卷积核对通道上下文在空间维度进行挤压(高度由h1挤压为h2,宽度由w1挤压为w2),获得在空间维度尺寸更小的第一中间特征。After obtaining the channel context features, the first convolution layer performs convolution processing on the channel context features and extracts the first intermediate features. The corresponding tensor size of the first intermediate features is (b, 1, h2, w2), and h2 < h1, w2<w1. Among them, the first convolution layer can use a convolution kernel to squeeze the channel context in the spatial dimension (the height is squeezed from h1 to h2, and the width is squeezed from w1 to w2) to obtain the first intermediate size with a smaller spatial dimension. feature.
进一步地,由ReLU激活层对第一中间特征进行非线性激活处理,获得第二中间特征,对应张量尺寸为(b,1,h2,w2)。Further, the ReLU activation layer performs nonlinear activation processing on the first intermediate feature to obtain the second intermediate feature, and the corresponding tensor size is (b, 1, h2, w2).
获得第二中间特征之后,由第一反卷积层对第二中间特征进行反卷积处理,实现其在空间维度的扩张,获得第三中间特征,对应张量尺寸为(b,1,h1,w1)。换言之,通过第一反卷积处理,将第二中间特征在空间维度的高度由h2扩张为h1,宽度由w2扩张为w1,同时保持通道数量不变。After obtaining the second intermediate feature, the first deconvolution layer performs deconvolution processing on the second intermediate feature to achieve its expansion in the spatial dimension, and obtain the third intermediate feature. The corresponding tensor size is (b, 1, h1 ,w1). In other words, through the first deconvolution process, the height of the second intermediate feature in the spatial dimension is expanded from h2 to h1, and the width is expanded from w2 to w1, while keeping the number of channels unchanged.
进一步地,经由Sigmoid激活层对第三中间特征进行非线性激活与归一化处理,获得空间注意力特征,对应张量尺寸为(b,1,h1,w1)。Further, the third intermediate feature is nonlinearly activated and normalized through the Sigmoid activation layer to obtain the spatial attention feature, and the corresponding tensor size is (b, 1, h1, w1).
通过逐点相乘单元,将空间注意力特征逐点乘到待处理特征,获得目标特征,对应的张量尺寸为(b,1,h1,w1)。该目标特征即为融合了空间注意力信息的特征。Through the point-by-point multiplication unit, the spatial attention features are multiplied point by point to the features to be processed to obtain the target features. The corresponding tensor size is (b, 1, h1, w1). The target feature is a feature that incorporates spatial attention information.
以待处理特征为(b,256,28,28),第一卷积层对应一个卷积核,且卷积核尺寸为3*3,步长为2,第一反卷积对应一个卷积核,且卷积核的尺寸为3*3,步长为2为例对上述数据处理过程进行说明。首先,将待处理特征输入空间注意力模块,由全局平均池化层逐元素地汇聚所有通道维度的信息,获得形状为(b,1,28,28)的通道上下文特征;然后由第一卷积层进行Conv3*3,步长为2的卷积处理,实现在空间维度的特征压缩,获得形状为(b,1,14,14)的第一中间特征;经过ReLU激活层对第一中间特征的非线性激活处理,获得形状为(b,1,14,14)的第二中间特征;接着由第一反卷积层进行TransposedConv3*3,步长为2的反卷积处理,实现在空间维度的特征扩张,获得形状为(b,1,28,28)的第三中间特征;经过Sigmoid激活层对第三中间特征的非线性激活与归一化处理,获得形状为(b,1,28,28)的空间注意力特征;将该空间注意特征逐点乘到待处理特征中,获得形状为(b,256,28,28)的目标特征。Taking the feature to be processed as (b, 256, 28, 28), the first convolution layer corresponds to a convolution kernel, and the convolution kernel size is 3*3, the step size is 2, and the first deconvolution corresponds to a convolution Kernel, and the size of the convolution kernel is 3*3, and the step size is 2. Take an example to illustrate the above data processing process. First, the features to be processed are input into the spatial attention module, and the global average pooling layer aggregates the information of all channel dimensions element by element to obtain channel context features with a shape of (b, 1, 28, 28); then the first volume The product layer performs Conv3*3 convolution processing with a step size of 2 to achieve feature compression in the spatial dimension and obtain the first intermediate feature with a shape of (b, 1, 14, 14); through the ReLU activation layer, the first intermediate feature is Nonlinear activation processing of features to obtain the second intermediate feature with a shape of (b, 1, 14, 14); then the first deconvolution layer performs TransposedConv3*3, deconvolution processing with a step size of 2, and is implemented in Through feature expansion in the spatial dimension, the third intermediate feature with a shape of (b, 1, 28, 28) is obtained; after nonlinear activation and normalization of the third intermediate feature by the Sigmoid activation layer, the shape is (b, 1 ,28,28) spatial attention features; multiply the spatial attention features point by point into the features to be processed to obtain the target feature with a shape of (b,256,28,28).
综上可知,待处理特征经过全局平均池化层的处理之后,通道数量仅为1,后续的Conv3*3以及TransposedConv3*3的计算量较小(以待处理特征(b,256,28,28)为例,通道数量缩小256倍,则计算量至少缩小256倍),从而可以有效地提高任务处理效率。In summary, it can be seen that after the features to be processed are processed by the global average pooling layer, the number of channels is only 1, and the subsequent Conv3*3 and TransposedConv3*3 require less calculation (based on the features to be processed (b, 256, 28, 28 ) for example, if the number of channels is reduced by 256 times, the amount of calculation will be reduced by at least 256 times), which can effectively improve task processing efficiency.
另外,还需要强调的是,基于SENet进行的通道挤压和激励通常建立在通道全连接的基础上,即不管输出通道有没有变化,每个输出通道均由全部输入通道计算获得。在本公开实施例中,目标特征中的每个像素仅由待处理特征中的3*3个像素计算获得,感受野大小为7*7左右(以卷积核尺寸为3*3,步长为2计算),因而导致空间注意力的作用范围相对有限。In addition, it needs to be emphasized that channel extrusion and excitation based on SENet are usually based on full connection of channels, that is, each output channel is calculated from all input channels regardless of whether the output channel changes or not. In the embodiment of the present disclosure, each pixel in the target feature is calculated from only 3*3 pixels in the feature to be processed, and the receptive field size is about 7*7 (the convolution kernel size is 3*3, the step size Calculated as 2), resulting in a relatively limited range of spatial attention.
有鉴于此,在一些可选的实现方式中,可以适当地增加上述感受野的大小,以扩大空间注意力的作用范围,从而提高任务处理准确度。应当理解,增加感受野的尺寸,通常会导致计算量的增加,换言之,上述任务处理准确度的增加,可以是以牺牲部分处理能力换来的效果。 In view of this, in some optional implementations, the size of the above-mentioned receptive field can be appropriately increased to expand the scope of spatial attention, thereby improving task processing accuracy. It should be understood that increasing the size of the receptive field usually leads to an increase in the amount of calculation. In other words, the increase in the accuracy of the above task processing may be achieved at the expense of some processing capabilities.
在一些可选的实现方式中,将图5所示的空间注意力模块视作朴素版本,通过对其中的某些网络层进行改进或加强,可以获得空间注意力作用范围更大的增强版本的空间注意力模块。In some optional implementations, the spatial attention module shown in Figure 5 is regarded as a simple version. By improving or strengthening some of the network layers, an enhanced version with a larger spatial attention range can be obtained. Spatial attention module.
示例性地,将图5中的全局平均池化层置换为全局卷积层,通过卷积的可学习参数来更好地保留待处理特征的信息。For example, the global average pooling layer in Figure 5 is replaced by a global convolution layer, and the learnable parameters of the convolution are used to better retain the information of the features to be processed.
示例性地,将图5中的第一卷积层置换为第二卷积层和第一通道平均层,其中,第二卷积层使用尺寸更大的卷积核,并对应多个输出通道,由第一通道平均层对各个输出通道的特征进行均值化处理,获得相应的第一中间特征。可以理解,由于卷积核的尺寸增大,因此,感受野的尺寸也相应增加,并且,由于通过多个输出通道的均值确定第一中间特征,可以在一定程度上提高第一中间特征的准确性。For example, the first convolution layer in Figure 5 is replaced with a second convolution layer and a first channel average layer, where the second convolution layer uses a larger convolution kernel and corresponds to multiple output channels. , the first channel averaging layer averages the features of each output channel to obtain the corresponding first intermediate features. It can be understood that since the size of the convolution kernel increases, the size of the receptive field also increases accordingly, and since the first intermediate feature is determined by the mean of multiple output channels, the accuracy of the first intermediate feature can be improved to a certain extent. sex.
示例性地,将图5中的第一反卷积层置换为第二反卷积层和第二通道平均层,其中,第二反卷积层可以使用尺寸更大的卷积核,并对应多个输出通道,由第二通道平均层对各个输出通道的特征进行均值化处理,获得相应的第三中间特征。可以理解,由于卷积核的尺寸增大,因此,感受野的尺寸也相应增加,并且,由于通过多个输出通道的均值确定第三中间特征,可以在一定程度上提高第三中间特征的准确性。For example, the first deconvolution layer in Figure 5 is replaced with a second deconvolution layer and a second channel average layer, where the second deconvolution layer can use a larger convolution kernel, and corresponds to For multiple output channels, the second channel averaging layer averages the features of each output channel to obtain the corresponding third intermediate features. It can be understood that since the size of the convolution kernel increases, the size of the receptive field also increases accordingly, and since the third intermediate feature is determined by the mean of multiple output channels, the accuracy of the third intermediate feature can be improved to a certain extent. sex.
需要说明的是,可以在朴素版本的空间注意力模块的基础上,实施上述任意一种或多种改进方式,以便增加空间注意力的作用范围。It should be noted that any one or more of the above improvements can be implemented on the basis of the simple version of the spatial attention module in order to increase the scope of spatial attention.
图6为本公开实施例提供的一种空间注意力模块的示意图,其属于加强版本的空间注意力模块。参照图6,该空间注意力模块包括:全局卷积层、特征提取层、ReLU激活层、特征还原层和Sigmoid激活层,其中,特征提取层包括第二卷积层和第一通道平均层,特征还原层包括第二反卷积层和第二通道平均层。Figure 6 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure, which is an enhanced version of the spatial attention module. Referring to Figure 6, the spatial attention module includes: a global convolution layer, a feature extraction layer, a ReLU activation layer, a feature reduction layer and a Sigmoid activation layer, where the feature extraction layer includes a second convolution layer and a first channel average layer, The feature reduction layer includes a second deconvolution layer and a second channel averaging layer.
在一些可选的实现方式中,待处理特征为一个四维张量(b,c1,h1,w1),其中,b表示待处理特征的数量,c1表示待处理特征的通道数量,h1和w1分别表示待处理特征的高度和宽度。In some optional implementations, the feature to be processed is a four-dimensional tensor (b, c1, h1, w1), where b represents the number of features to be processed, c1 represents the number of channels of the feature to be processed, h1 and w1 respectively Represents the height and width of the feature to be processed.
将待处理特征输入空间注意力模块之后,首先由全局卷积层对待处理特征进行全局卷积处理,获得通道上下文特征。由于是全局卷积处理,因此,该通道上下文特征为全局的通道上下文特征,而且,该通道上下文特征对应的张量尺寸为(b,1,h1,w1),换言之,全局卷积层将待处理特征的通道数量由c1压缩为1,不改变其在空间维度的特征尺寸,同时通过全局卷积层中的可学习参数更好地保留了特征信息。After inputting the features to be processed into the spatial attention module, the global convolution layer first performs global convolution processing on the features to be processed to obtain the channel context features. Since it is a global convolution process, the channel context feature is a global channel context feature, and the tensor size corresponding to the channel context feature is (b, 1, h1, w1). In other words, the global convolution layer will wait for The number of channels processing features is compressed from c1 to 1, without changing its feature size in the spatial dimension. At the same time, the feature information is better retained through the learnable parameters in the global convolution layer.
在获得通道上下文特征之后,通过由第二卷积层和第一通道平均层组成的特征提取层对通道上下文特征进行处理,获得第一中间特征。示例性地,由第二卷积层对通道上下文特征进行卷积处理,获得多个通道的第四中间特征,再由第一通道平均层计算多个第四中间特征在通道维度的平均值,获得第一中间特征。其中,多个通道的第四中间特征对应的张量尺寸为(b,c2,h3,w3),第一中间特征对应的张量尺寸为(b,1,h3,w3),1<c2<c1,h3<h1,w3<w1。换言之,第二卷积层根据1个通道的通道上下文特征获得c2个通道的第四中间特征,同时对通道上下文特征在空间维度进行了挤压(高度由h1挤压为h3,宽度由w1挤压为w3);第一通道平均层在通道维度计算上述多个第四中间特征的平均值,获得第一中间特征。After the channel context features are obtained, the channel context features are processed through a feature extraction layer composed of a second convolution layer and a first channel average layer to obtain the first intermediate features. For example, the second convolution layer performs convolution processing on the channel context features to obtain the fourth intermediate features of multiple channels, and then the first channel average layer calculates the average value of the multiple fourth intermediate features in the channel dimension, Get the first intermediate feature. Among them, the tensor size corresponding to the fourth intermediate feature of multiple channels is (b, c2, h3, w3), and the tensor size corresponding to the first intermediate feature is (b, 1, h3, w3), 1<c2< c1, h3<h1, w3<w1. In other words, the second convolutional layer obtains the fourth intermediate feature of c2 channels based on the channel context feature of 1 channel, and at the same time squeezes the channel context feature in the spatial dimension (the height is squeezed from h1 to h3, and the width is squeezed from w1 The pressure is w3); the first channel averaging layer calculates the average of the above-mentioned plurality of fourth intermediate features in the channel dimension to obtain the first intermediate feature.
进一步地,由ReLU激活层对第一中间特征进行非线性激活处理,获得第二中间特征,对应张量尺寸为(b,1,h3,w3)。Further, the ReLU activation layer performs nonlinear activation processing on the first intermediate feature to obtain the second intermediate feature, and the corresponding tensor size is (b, 1, h3, w3).
获得第二中间特征之后,通过由第二反卷积层和第二通道平均层组成的特征还原层对第二中间特征进行处理,获得第三中间特征。示例性地,由第二反卷积层对第二中间特征进行反卷积处理,获得多个通道的第五中间特征,再由第二通道平均层计算多个第五中间特征在通道维度的平均值,获得第三中间特征。其中,多个通道的第五中间特征对应的张量尺寸为(b,c3,h1,w1),第三中间特 征对应的张量尺寸为(b,1,h1,w1),1<c3<c1,且c3与c2可以相同,也可以不同。换言之,第二反卷积层根据1个通道的第二中间特征获得c3个通道的第五中间特征,同时对第二中间特征在空间维度进行了扩张(高度由h3扩张为h1,宽度由w3扩张为w1);第二通道平均层在通道维度计算上述多个第五中间特征的平均值,获得第三中间特征。After the second intermediate feature is obtained, the second intermediate feature is processed through a feature reduction layer composed of a second deconvolution layer and a second channel average layer to obtain a third intermediate feature. For example, the second intermediate feature is deconvolved by the second deconvolution layer to obtain the fifth intermediate feature of multiple channels, and then the second channel average layer is used to calculate the channel dimension of the multiple fifth intermediate features. average to obtain the third intermediate feature. Among them, the tensor size corresponding to the fifth intermediate feature of multiple channels is (b, c3, h1, w1), and the third intermediate feature The corresponding tensor size is (b, 1, h1, w1), 1<c3<c1, and c3 and c2 can be the same or different. In other words, the second deconvolution layer obtains the fifth intermediate feature of c3 channels based on the second intermediate feature of 1 channel, and at the same time expands the second intermediate feature in the spatial dimension (the height is expanded from h3 to h1, and the width is expanded from w3 Expanded to w1); the second channel averaging layer calculates the average of the above-mentioned plurality of fifth intermediate features in the channel dimension to obtain the third intermediate feature.
进一步地,经由Sigmoid激活层对第三中间特征进行非线性激活与归一化处理,获得空间注意力特征,对应张量尺寸为(b,1,h1,w1)。Further, the third intermediate feature is nonlinearly activated and normalized through the Sigmoid activation layer to obtain the spatial attention feature, and the corresponding tensor size is (b, 1, h1, w1).
通过逐点相乘单元,将空间注意力特征逐点乘到待处理特征,获得目标特征,对应的张量尺寸为(b,1,h1,w1)。Through the point-by-point multiplication unit, the spatial attention features are multiplied point by point to the features to be processed to obtain the target features. The corresponding tensor size is (b, 1, h1, w1).
以待处理特征为(b,256,28,28),第二卷积层对应四个卷积核,每个卷积核对应一个通道,且卷积核尺寸为7*7,步长为4,膨胀系数为2,第二反卷积对应四个卷积核,每个卷积核对应一个通道,且卷积核的尺寸为7*7,步长为4,膨胀系数为2为例进行说明。首先,将待处理特征输入空间注意力模块,由全局卷积层在通道维度对待处理特征进行特征压缩,获得形状为(b,1,28,28)的通道上下文特征;然后由第二卷积层使用对应4个通道的卷积核,进行Conv7*7,步长为4,膨胀系数为2的卷积处理,将通道上下文特征在通道维度扩展为4个通道,在空间维度进行特征压缩,获得形状为(b,4,7,7)的第四中间特征(或者对应4个通道的形状为(b,1,7,7)的第四中间特征);第一通道平均层在通道维度计算上述第四中间特征的平均值(即计算处于相同空间位置的特征点在4个通道的平均值),获得第一中间特征,对应尺寸为(b,1,7,7);经过ReLU激活层对第一中间特征的非线性激活处理,获得形状为(b,1,7,7)的第二中间特征;接着由第二反卷积层使用对应4个通道的卷积核,进行TransposedConv7*7,步长为4,膨胀系数为2的反卷积处理,将第二特征在通道维度扩展为4个通道,在空间维度进行特征扩张,获得形状为(b,4,28,28)的第五中间特征(或者对应4个通道的形状为(b,1,28,28)的第五中间特征);第二通道平均层在通道维度计算上述第五中间特征的平均值(即计算处于相同空间位置的特征点在4个通道的平均值),获得第三中间特征,对应尺寸为(b,1,28,28);经过Sigmoid激活层对第三中间特征的非线性激活与归一化处理,获得形状为(b,1,28,28)的空间注意力特征;将该空间注意特征逐点乘到待处理特征中,获得形状为(b,256,28,28)的目标特征。Taking the features to be processed as (b, 256, 28, 28), the second convolution layer corresponds to four convolution kernels, each convolution kernel corresponds to a channel, and the convolution kernel size is 7*7 and the step size is 4 , the expansion coefficient is 2, the second deconvolution corresponds to four convolution kernels, each convolution kernel corresponds to a channel, and the size of the convolution kernel is 7*7, the step size is 4, and the expansion coefficient is 2, as an example illustrate. First, the features to be processed are input into the spatial attention module, and the global convolution layer performs feature compression on the features to be processed in the channel dimension to obtain channel context features with a shape of (b, 1, 28, 28); then the second convolution is used to compress the features to be processed in the channel dimension. The layer uses convolution kernels corresponding to 4 channels to perform Conv7*7 convolution processing with a step size of 4 and an expansion coefficient of 2. The channel context features are expanded to 4 channels in the channel dimension, and feature compression is performed in the spatial dimension. Obtain the fourth intermediate feature with shape (b,4,7,7) (or the fourth intermediate feature with shape (b,1,7,7) corresponding to 4 channels); the first channel average layer is in the channel dimension Calculate the average of the above-mentioned fourth intermediate feature (that is, calculate the average of the feature points at the same spatial position in the four channels) to obtain the first intermediate feature, with the corresponding size (b, 1, 7, 7); after ReLU activation The layer performs nonlinear activation processing on the first intermediate feature to obtain the second intermediate feature with a shape of (b, 1, 7, 7); then the second deconvolution layer uses the convolution kernel corresponding to the 4 channels to perform TransposedConv7 *7, deconvolution processing with a step size of 4 and an expansion coefficient of 2, expand the second feature to 4 channels in the channel dimension, perform feature expansion in the spatial dimension, and obtain the shape (b, 4, 28, 28) The fifth intermediate feature (or the fifth intermediate feature corresponding to the shape of 4 channels is (b, 1, 28, 28)); the second channel averaging layer calculates the average of the above fifth intermediate feature in the channel dimension (that is, calculates The average of the 4 channels of feature points at the same spatial position) is obtained, and the third intermediate feature is obtained, with the corresponding size (b, 1, 28, 28); the third intermediate feature is nonlinearly activated and reduced through the Sigmoid activation layer. After unified processing, a spatial attention feature with a shape of (b, 1, 28, 28) is obtained; the spatial attention feature is multiplied point by point into the features to be processed, and a target with a shape of (b, 256, 28, 28) is obtained. feature.
需要说明的是,在使用膨胀系数为2,尺寸为7*7的卷积核进行卷积处理时,需要在特征边缘补零(也可以采用其他补数方法),可能导致特征提取结果受补零操作的影响,造成结果的不准确。基于此,在第二卷积层和第二反卷积层中,采用多卷积核策略,即使用多个卷积核扩大特征的通道数,再通过计算特征在通道维度平均值的方式减少上述补零操作的影响。It should be noted that when using a convolution kernel with an expansion coefficient of 2 and a size of 7*7 for convolution processing, zeros need to be padded at the feature edge (other complement methods can also be used), which may cause the feature extraction results to be padded. There is zero operational impact, causing inaccuracies in the results. Based on this, in the second convolution layer and the second deconvolution layer, a multi-convolution kernel strategy is adopted, that is, multiple convolution kernels are used to expand the number of channels of the feature, and then the number of channels is reduced by calculating the average value of the feature in the channel dimension. The impact of the above zero padding operation.
需要强调的是,以待处理特征为(b,256,28,28)为例,在朴素版本的空间自注意力模块处理过程中,空间自注意力的作用范围为7*7左右,相对于特征的空间维度尺寸28*28而言较小;在加强版本的空间自注意力模块处理过程中,膨胀系数为2,步长为4,尺寸为7*7的卷积核,其感受野在53*53左右。换言之,加强后的空间自注意力的作用范围为53*53左右,较朴素版本得以有效扩大,即便对于目标检测网络的最大特征图YOLO(空间维度尺寸为608/8=76)而言,53*53的感受野也足够使用。It should be emphasized that, taking the feature to be processed as (b, 256, 28, 28) as an example, during the processing of the simple version of the spatial self-attention module, the scope of the spatial self-attention is about 7*7, which is The spatial dimension of the feature is small compared to 28*28; during the processing of the enhanced version of the spatial self-attention module, the expansion coefficient is 2, the step size is 4, and the convolution kernel with a size of 7*7 has a receptive field in About 53*53. In other words, the scope of the enhanced spatial self-attention is about 53*53, and the simpler version can be effectively expanded. Even for the largest feature map YOLO of the target detection network (the spatial dimension size is 608/8=76), 53 *The receptive field of 53 is also sufficient.
还需要说明的是,虽然相较于特征的空间维度尺寸28*28而言,膨胀系数为2,步长为4,尺寸为7*7的卷积核属于超大卷积核,但是,在相关技术中已经验证,在面对小尺寸特征图的处理时,超大卷积核仍然具有良好的处理效果和处理效率,不存在超大卷积核无法处理小尺寸特征的问题。It should also be noted that although compared to the feature spatial dimension size of 28*28, the convolution kernel with an expansion coefficient of 2, a step size of 4, and a size of 7*7 is a very large convolution kernel, however, in the relevant Technology has verified that when faced with the processing of small-size feature maps, ultra-large convolution kernels still have good processing effects and processing efficiency, and there is no problem that ultra-large convolution kernels cannot process small-size features.
应当理解,在加强版本的空间自注意力处理过程中,虽然在部分处理步骤中增加了通道数量,但是,由于在处理伊始已经将待处理特征从通道维度压缩为较小的特征,因此,计算量虽然有所增加,但是增加量并不大,相较于对任务处理结果所能提升的准确度而言,新增的计算量的性价较高。 It should be understood that in the enhanced version of spatial self-attention processing, although the number of channels is increased in some processing steps, since the features to be processed have been compressed from the channel dimension into smaller features at the beginning of the processing, therefore, the calculation Although the amount of calculation has increased, the increase is not large. Compared with the accuracy that can be improved in the task processing results, the new calculation amount is more cost-effective.
需要说明的是,上述图4-6所示的空间注意力模块对应SENet中的SE Block结构,在图3所示的空间注意力模块的基础上,可以进行多种变形处理,从而获得关于SENet的多种变体结构,这些变体同样可用于进行空间注意力机制的处理,以获得空间注意力信息。It should be noted that the spatial attention module shown in Figure 4-6 above corresponds to the SE Block structure in SENet. Based on the spatial attention module shown in Figure 3, a variety of deformation processes can be performed to obtain information about SENet A variety of variant structures, these variants can also be used to process the spatial attention mechanism to obtain spatial attention information.
图7为本公开实施例提供的一种空间注意力模块的示意图,其属于SENet的变体之一(即Simplified NL Block)。参照图7,在该空间注意力模块中,待处理特征对应的张量尺寸为(b,c,h1,w1),上下文建模单元包括第三卷积层和归一化层,其用于从通道维度对待处理特征进行特征压缩,获得通道上下文特征,对应的张量尺寸为(b,1,h1,w1);第四卷积层对应转换单元,用于对通道上下文特征进行特征转换,获得空间注意力特征,对应的张量尺寸为(b,1,h1,w1);融合单元由逐点相乘器实现,用于将空间注意力特征逐点乘到待处理特征中,获得目标特征,目标特征的张量尺寸为(b,c,h1,w1)。Figure 7 is a schematic diagram of a spatial attention module provided by an embodiment of the present disclosure, which belongs to one of the variants of SENet (ie Simplified NL Block). Referring to Figure 7, in the spatial attention module, the tensor size corresponding to the feature to be processed is (b, c, h1, w1), and the context modeling unit includes a third convolution layer and a normalization layer, which is used to Perform feature compression on the features to be processed from the channel dimension to obtain the channel context features. The corresponding tensor size is (b, 1, h1, w1); the fourth convolution layer corresponds to the conversion unit, which is used to perform feature conversion on the channel context features. Obtain spatial attention features, and the corresponding tensor size is (b, 1, h1, w1); the fusion unit is implemented by a point-by-point multiplier, which is used to multiply the spatial attention features into the features to be processed point by point to obtain the target Features, the tensor size of the target feature is (b, c, h1, w1).
由此可知,采用Simplified NL Block结构,同样可以获得待处理特征在空间维度的注意力信息。It can be seen that by using the Simplified NL Block structure, the attention information of the feature to be processed in the spatial dimension can also be obtained.
需要说明的是,除Simplified NL Block之外,SENet还存在多种变体,例如全局上下文建模框架(Global context Block,GC Block)等,但是无论何种变体,其处理过程都是相似的,均是先从通道维度对待处理特征进行压缩,获得通道上下文特征,再在空间维度激励通道上下文特征,以获得空间注意力特征,最后将空间注意力特征与待处理特征进行融合,从而获得目标特征。It should be noted that in addition to Simplified NL Block, there are many variants of SENet, such as global context modeling framework (Global context Block, GC Block), etc., but no matter what variant, the processing process is similar , both of which first compress the features to be processed from the channel dimension to obtain the channel context features, then stimulate the channel context features in the spatial dimension to obtain the spatial attention features, and finally fuse the spatial attention features with the features to be processed to obtain the target feature.
在一些可选的实现方式中,可以将上述空间注意力机制与通道注意力机制结合使用,以进一步提升任务处理结果的准确性。In some optional implementations, the above spatial attention mechanism can be used in combination with the channel attention mechanism to further improve the accuracy of task processing results.
在一些可选的实现方式中,处理结果可以是由空间维度的注意力信息和通道维度的注意力信息确定的,空间维度的注意力信息基于空间注意力机制获得,通道维度的注意力信息基于通道注意力机制获得。In some optional implementations, the processing result may be determined by attention information in the spatial dimension and attention information in the channel dimension. The attention information in the spatial dimension is obtained based on the spatial attention mechanism, and the attention information in the channel dimension is obtained based on Channel attention mechanism is obtained.
换言之,较只使用空间注意力机制或者只使用通道注意力机制获取处理结果而言,将两者机制相结合,可以同时获得空间维度的注意力信息和通道维度的注意力信息,特征的表征效果得以进一步提升,相应的,任务处理结果的准确性也能得到提高。In other words, compared with using only the spatial attention mechanism or only using the channel attention mechanism to obtain processing results, combining the two mechanisms can simultaneously obtain the attention information of the spatial dimension and the attention information of the channel dimension, and the representation effect of the features can be further improved, and correspondingly, the accuracy of task processing results can also be improved.
图8为本公开实施例提供的一种目标神经网络的示意图。参照图8,该目标神经网络包括第一网络结构、空间注意力模块、通道注意力模块、融合模块及第二网络结构,其中,空间注意力模块是基于空间注意力机制设置的模块,用于获取空间维度的注意力信息,通道注意力模块是基于通道注意力机制设置的模块,用于获取通道维度的注意力信息。Figure 8 is a schematic diagram of a target neural network provided by an embodiment of the present disclosure. Referring to Figure 8 , the target neural network includes a first network structure, a spatial attention module, a channel attention module, a fusion module and a second network structure, where the spatial attention module is a module set based on the spatial attention mechanism for To obtain attention information in the spatial dimension, the channel attention module is a module set based on the channel attention mechanism and is used to obtain attention information in the channel dimension.
在一些可选的实现方式中,将待处理数据输入目标神经网络中,由位于空间注意力模块和通道注意力模块之前的第一网络结构对待处理数据的处理,获得待处理特征,并将待处理特征作为空间注意力模块和通道注意力模块的输入数据。In some optional implementations, the data to be processed is input into the target neural network, and the first network structure located before the spatial attention module and the channel attention module processes the data to be processed, obtains the features to be processed, and adds the features to be processed. The features are processed as input data to the spatial attention module and the channel attention module.
对于空间注意力模块而言,其通过第一上下文建模单元从通道维度对待处理特征进行特征压缩,获得通道上下文特征,并将通道上下文特征输入第一转换单元;第一转换单元对该通道上下文特征进行特征转换,获得空间注意力特征,并将空间注意力特征输入第一融合单元;第一融合单元将待处理特征与空间注意力特征通过逐点相乘或者逐点相加等方式进行特征融合,获得第一目标特征,并将第一目标特征输入融合模块。For the spatial attention module, it performs feature compression on the features to be processed from the channel dimension through the first context modeling unit, obtains the channel context features, and inputs the channel context features into the first conversion unit; the first conversion unit The features are transformed into features to obtain spatial attention features, and the spatial attention features are input into the first fusion unit; the first fusion unit combines the features to be processed with the spatial attention features by point-by-point multiplication or point-by-point addition. Fusion, obtain the first target feature, and input the first target feature into the fusion module.
与空间注意力模块类似,通道注意力模块通过第二上下文建模单元从空间维度对待处理特征进行特征压缩,获得空间上下文特征,并将空间上下文特征输入第二转换单元;第二转换单元对该空间上下文特征进行特征转换,获得通道注意力特征,并将通道注意力特征输入第二融合单元;第二融合单元将待处理特征与通道注意力特征通过逐点相乘或者逐点相加等方式进行特征融合,获得第二目标特征,并将第二目标特征输入融合模块。Similar to the spatial attention module, the channel attention module performs feature compression on the features to be processed from the spatial dimension through the second context modeling unit, obtains the spatial context features, and inputs the spatial context features into the second conversion unit; the second conversion unit The spatial context features are transformed into features to obtain channel attention features, and the channel attention features are input into the second fusion unit; the second fusion unit multiplies the features to be processed and the channel attention features by point-by-point multiplication or point-by-point addition. Perform feature fusion to obtain the second target feature, and input the second target feature into the fusion module.
融合模块进一步将第一目标特征和第二目标特征进行融合,获得既包括空间注意力信息也包括 通道注意力信息的融合特征,并将该融合特征输入第二网络结构,由第二网络结构基于该融合特征执行相应的数据处理,获得处理结果。The fusion module further fuses the first target feature and the second target feature to obtain both spatial attention information and The fusion feature of the channel attention information is input into the second network structure, and the second network structure performs corresponding data processing based on the fusion feature to obtain the processing result.
上述处理过程中,空间注意力模块和通道注意力模块作用于同一待处理特征,用于从空间维度和通道维度同时加强其特征表达效果。在一些可选的实现方式中,空间注意力模块和通道注意力模块还可以作用于不同的待处理特征,以针对不同的待处理特征采取不同的处理方式。示例性的,空间注意力模块作用于第一待处理特征,用于获得第一待处理特征在空间维度的注意力信息;通道注意力模块作用于第二待处理特征,用于获得第二待处理特征在通道维度的注意力信息。During the above processing, the spatial attention module and the channel attention module act on the same feature to be processed to simultaneously enhance its feature expression effect from the spatial dimension and the channel dimension. In some optional implementations, the spatial attention module and the channel attention module can also act on different features to be processed, so as to adopt different processing methods for different features to be processed. Exemplarily, the spatial attention module acts on the first feature to be processed to obtain the attention information of the first feature to be processed in the spatial dimension; the channel attention module acts on the second feature to be processed to obtain the second feature to be processed. Process the attentional information of features in the channel dimension.
如前所述,空间注意力模块包括朴素版本和加强版本,与之类似,通道注意力模块也可以包括朴素版本和加强版本。而且,在目标神经网络中,可以同时使用两者的朴素版本,也可以同时使用两者的加强版本,还可以使用其中一者的朴素版本而使用另一者的加强版本,本公开实施例对此不作限制。As mentioned before, the spatial attention module includes a naive version and an enhanced version. Similarly, the channel attention module can also include a naive version and an enhanced version. Moreover, in the target neural network, the naive versions of both can be used at the same time, or the enhanced versions of both can be used at the same time, or the naive version of one of them can be used and the enhanced version of the other can be used. The embodiments of the present disclosure are suitable for This is not a restriction.
需要说明的是,空间注意力模块和通道注意力模块可以使用各类SENet(包括相应的变体)结构,而且,空间注意力模块和通道注意力模块可以使用相同的SENet结构,也可以使用不同的SENet结构,本公开实施例对此不作限制。It should be noted that the spatial attention module and the channel attention module can use various SENet (including corresponding variants) structures, and the spatial attention module and the channel attention module can use the same SENet structure or different ones. SENet structure, the embodiment of the present disclosure does not limit this.
还需要说明的是,根据上述数据处理过程可知,空间注意力模块和通道注意力模块的处理过程相对独立,不依赖对方的处理结果即可执行数据处理,因此,空间注意力模块和通道注意力模块的处理过程可以同时执行,也可以先后执行,本公开实施例对此不作限制。It should also be noted that according to the above data processing process, the processing processes of the spatial attention module and the channel attention module are relatively independent, and data processing can be performed without relying on the processing results of the other party. Therefore, the spatial attention module and the channel attention module The processing procedures of the modules can be executed simultaneously or sequentially, and the embodiments of the present disclosure do not limit this.
在一些可选的实现方式中,上述空间注意力模块和通道注意力模块可以由不同的硬件设备承载,也可以由相同的硬件设备承载。在由相同硬件设备承载的情况下,空间注意力模块和通道注意力模块的处理过程可以先后依次执行,或者通过建立两个进程,在两个进程中同时执行。在由不同硬件设备承载的情况下,空间注意力模块和通道注意力模块的处理过程可以由各自的硬件设备同时执行,也可以先后依次执行。In some optional implementations, the above-mentioned spatial attention module and channel attention module can be carried by different hardware devices, or can be carried by the same hardware device. When hosted by the same hardware device, the processing processes of the spatial attention module and the channel attention module can be executed sequentially, or by establishing two processes, they can be executed simultaneously in the two processes. When carried by different hardware devices, the processing processes of the spatial attention module and the channel attention module can be executed by the respective hardware devices at the same time, or can be executed in sequence.
本公开实施例第二方面提供一种神经网络模型。A second aspect of the embodiment of the present disclosure provides a neural network model.
图9为本公开实施例提供的一种神经网络模型的示意图。参照图9,该神经网络模型是基于目标神经网络的模型参数构建的模型,其中,目标神经网络采用本公开实施例中任一项的目标神经网络。Figure 9 is a schematic diagram of a neural network model provided by an embodiment of the present disclosure. Referring to FIG. 9 , the neural network model is a model constructed based on the model parameters of the target neural network, wherein the target neural network adopts the target neural network in any one of the embodiments of the present disclosure.
在一些可选的实现方式中,神经网络模型可用于执行图像处理任务、语音处理任务、文本处理任务、视频处理任务中的至少一种。无论该神经网络模型执行何种任务,在执行过程中均需要获取特征在空间维度的注意力信息,基于此,该神经网络模型在执行任务过程中包括如下步骤:从通道维度对特征进行压缩,激励经过通道压缩的特征在空间维度的关联性,获得特征在空间维度的注意力信息。换言之,针对不同类型的任务,神经网络模型的结构可能有所不同,但是无论其结构如何变化,其均包括用于执行空间注意力机制的功能模块。In some optional implementations, the neural network model can be used to perform at least one of image processing tasks, speech processing tasks, text processing tasks, and video processing tasks. No matter what kind of task the neural network model performs, it needs to obtain the attention information of the features in the spatial dimension during the execution process. Based on this, the neural network model includes the following steps during the execution of the task: compress the features from the channel dimension, Excite the correlation of the channel-compressed features in the spatial dimension to obtain the attention information of the features in the spatial dimension. In other words, the structure of the neural network model may be different for different types of tasks, but no matter how its structure changes, it includes functional modules for executing the spatial attention mechanism.
在一些可选的实现方式中,根据待处理任务搭建初始的神经网络模型,在初始的神经网络模型中,至少部分模型参数是初始参数,直接基于初始的神经网络模型执行待处理任务时,任务处理准确率较低。基于此,利用目标神经网络的模型参数更新该初始神经网络模型中的对应参数,以获得准确率较高的神经网络模型。In some optional implementations, an initial neural network model is built based on the task to be processed. In the initial neural network model, at least some of the model parameters are initial parameters. When the task to be processed is executed directly based on the initial neural network model, the task Processing accuracy is low. Based on this, the model parameters of the target neural network are used to update the corresponding parameters in the initial neural network model to obtain a neural network model with higher accuracy.
在一些可选的实现方式中,基于目标神经网络的模型参数构建神经网络模型的过程可以通过模型训练方式实现。In some optional implementations, the process of building a neural network model based on the model parameters of the target neural network can be implemented through model training.
示例性地,首先,搭建初始的神经网络模型,在初始的神经网络模型中,各项模型参数是依据经验、统计数据设置或者随机设置的初始化参数,该初始模型无法直接用于执行任务。其次,获取相应的训练集,并基于训练集对初始的神经网络模型进行训练,获得训练结果。然后,根据训练结 果和预设的迭代条件确定是否继续训练模型,其中,在确定继续训练模型的情况下,说明当前的模型参数还未达到最优,存在继续优化的空间,因此,根据本轮训练结果更新模型参数,并基于训练集对更新后的模型进行迭代训练,直到确定停止训练模型,从而获得训练好的神经网络模型。在训练好的神经网络模型中,模型参数即对应目标神经网络的模型参数。For example, first, an initial neural network model is built. In the initial neural network model, each model parameter is an initialization parameter set based on experience, statistical data, or randomly set. This initial model cannot be directly used to perform tasks. Secondly, obtain the corresponding training set, train the initial neural network model based on the training set, and obtain the training results. Then, according to the training results The result and the preset iteration conditions determine whether to continue training the model. If it is determined to continue training the model, it means that the current model parameters have not reached the optimum and there is room for continued optimization. Therefore, the model is updated according to the results of this round of training. parameters, and iteratively train the updated model based on the training set until it is determined to stop training the model, thereby obtaining a trained neural network model. In the trained neural network model, the model parameters correspond to the model parameters of the target neural network.
需要说明的是,在基于训练集获得训练好的神经网络模型之后,还可以基于验证集进行模型验证与矫正,类似的,也可以基于测试集进行模型评估,本公开实施例对神经网络模型的获取方法不作限制。It should be noted that after obtaining the trained neural network model based on the training set, model verification and correction can also be performed based on the verification set. Similarly, model evaluation can also be performed based on the test set. The embodiments of the present disclosure improve the neural network model. There is no restriction on the acquisition method.
可以理解,本公开提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例,限于篇幅,本公开不再赘述。本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。It can be understood that the above-mentioned method embodiments mentioned in this disclosure can be combined with each other to form a combined embodiment without violating the principle logic. Due to space limitations, the details will not be described in this disclosure. Those skilled in the art can understand that in the above-mentioned methods of specific embodiments, the specific execution order of each step should be determined by its function and possible internal logic.
此外,本公开还提供了数据处理装置、电子设备、计算机可读存储介质,上述均可用来实现本公开提供的任一种数据处理方法,相应技术方案和描述和参见方法部分的相应记载,不再赘述。In addition, the disclosure also provides data processing devices, electronic equipment, and computer-readable storage media, all of which can be used to implement any of the data processing methods provided by the disclosure. For corresponding technical solutions and descriptions, please refer to the corresponding records in the method section. Again.
图10为本公开实施例提供的一种数据处理装置的框图。Figure 10 is a block diagram of a data processing device provided by an embodiment of the present disclosure.
参照图10,本公开实施例提供了一种数据处理装置,该数据处理装置包括如下模块。Referring to Figure 10, an embodiment of the present disclosure provides a data processing device, which includes the following modules.
数据处理模块101,用于将待处理数据输入目标神经网络,基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果。The data processing module 101 is used to input the data to be processed into the target neural network, perform data processing based on the spatial attention mechanism of the squeeze and excitation framework, and obtain processing results.
其中,空间注意力机制用于从通道维度对特征进行压缩,激励经过通道压缩的特征在空间维度的关联性,获得所述特征在空间维度的注意力信息。Among them, the spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain the attention information of the features in the spatial dimension.
在一些可选的实现方式中,数据处理模块包括第一确定子模块、压缩子模块、转换子模块、融合子模块和第二确定子模块。其中,第一确定子模块,用于根据所述待处理数据确定待处理特征;压缩子模块,用于从通道维度对所述待处理特征进行特征压缩,获得通道上下文特征;转换子模块,用于对所述通道上下文特征进行特征转换,获得空间注意力特征;融合子模块,用于对所述待处理特征和所述空间注意力特征进行特征融合,获得目标特征;第二确定子模块,用于根据所述目标特征,确定处理结果;其中,所述待处理特征、所述通道上下文特征、所述空间注意力特征以及所述目标特征在空间维度的特征尺寸相同,所述待处理特征与所述目标特征在通道维度具有相同的第一特征尺寸,所述通道上下文特征与所述空间注意力特征在通道维度具有相同的第二特征尺寸,且所述第一特征尺寸大于所述第二特征尺寸。In some optional implementations, the data processing module includes a first determination sub-module, a compression sub-module, a conversion sub-module, a fusion sub-module and a second determination sub-module. Among them, the first determination sub-module is used to determine the features to be processed based on the data to be processed; the compression sub-module is used to compress the features to be processed from the channel dimension to obtain channel context features; the conversion sub-module is used to for performing feature conversion on the channel context features to obtain spatial attention features; a fusion submodule for feature fusion of the to-be-processed features and the spatial attention features to obtain target features; a second determination submodule, Used to determine the processing result according to the target feature; wherein the feature size to be processed, the channel context feature, the spatial attention feature and the target feature in the spatial dimension are the same, and the feature to be processed is The channel context feature and the spatial attention feature have the same first feature size in the channel dimension, and the channel context feature and the spatial attention feature have the same second feature size in the channel dimension, and the first feature size is larger than the third feature size. 2. Feature size.
将上述功能子模块映射到图3所示的目标神经网络中,第一确定子模块对应第一网络结构,压缩子模块对应上下文建模单元,转换子模块对应转换单元,融合子模块对应融合单元,第二确定子模块对应第二网络结构。其中,压缩子模块、转换子模块、融合子模块还包括更细粒度的功能单元,相关内容可参见本公开实施例的相应描述,在此不再重复展开。The above functional sub-modules are mapped to the target neural network shown in Figure 3. The first determination sub-module corresponds to the first network structure, the compression sub-module corresponds to the context modeling unit, the conversion sub-module corresponds to the conversion unit, and the fusion sub-module corresponds to the fusion unit , the second determination sub-module corresponds to the second network structure. Among them, the compression sub-module, conversion sub-module, and fusion sub-module also include more fine-grained functional units. For relevant content, please refer to the corresponding descriptions of the embodiments of the present disclosure, and will not be repeated here.
图11为本公开实施例提供的一种电子设备的框图。Figure 11 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
参照图11,本公开实施例提供了一种电子设备,该电子设备包括:至少一个处理器1101;至少一个存储器1102,以及一个或多个I/O接口1103,连接在处理器1101与存储器1102之间;其中,存储器1102存储有可被至少一个处理器501执行的一个或多个计算机程序,一个或多个计算机程序被至少一个处理器1101执行,以使至少一个处理器1101能够执行上述的数据处理方法。Referring to Figure 11, an embodiment of the present disclosure provides an electronic device, which includes: at least one processor 1101; at least one memory 1102, and one or more I/O interfaces 1103 connected between the processor 1101 and the memory 1102 among them, the memory 1102 stores one or more computer programs that can be executed by at least one processor 501, and the one or more computer programs are executed by at least one processor 1101, so that at least one processor 1101 can execute the above-mentioned Data processing methods.
图12为本公开实施例提供的一种电子设备的框图。Figure 12 is a block diagram of an electronic device provided by an embodiment of the present disclosure.
参照图12,本公开实施例提供了一种电子设备,该电子设备包括多个处理核1201以及片上网络1202,其中,多个处理核1201均与片上网络1202连接,片上网络1202用于交互多个处理核间的数据和外部数据。Referring to Figure 12, an embodiment of the present disclosure provides an electronic device. The electronic device includes multiple processing cores 1201 and an on-chip network 1202. The multiple processing cores 1201 are connected to the on-chip network 1202. The on-chip network 1202 is used to interact with multiple processing cores. One handles inter-core data and external data.
其中,一个或多个处理核1201中存储有一个或多个指令,一个或多个指令被一个或多个处理核 1201执行,以使一个或多个处理核1201能够执行上述的数据处理方法。Among them, one or more instructions are stored in one or more processing cores 1201, and one or more instructions are processed by one or more processing cores 1201. 1201 is executed so that one or more processing cores 1201 can execute the above data processing method.
在一些实施例中,该电子设备可以是类脑芯片,由于类脑芯片可以采用向量化计算方式,且需要通过外部内存例如双倍速率(Double Data Rate,DDR)同步动态随机存储器调入神经网络模型的权重信息等参数。因此,本公开实施例采用批处理的运算效率较高。In some embodiments, the electronic device may be a brain-like chip, because the brain-like chip can adopt a vectorized calculation method and needs to be loaded into the neural network through an external memory such as a double data rate (Double Data Rate, DDR) synchronous dynamic random access memory. Model weight information and other parameters. Therefore, the operation efficiency of batch processing in the embodiments of the present disclosure is relatively high.
本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序在被处理器/处理核执行时实现上述的数据处理方法。计算机可读存储介质可以是易失性或非易失性计算机可读存储介质。Embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, wherein the computer program implements the above-mentioned data processing method when executed by a processor/processing core. Computer-readable storage media may be volatile or non-volatile computer-readable storage media.
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行上述数据处理方法。Embodiments of the present disclosure also provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code. When the computer readable code is stored in a processor of an electronic device, When running, the processor in the electronic device executes the above data processing method.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读存储介质上,计算机可读存储介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。Those of ordinary skill in the art can understand that all or some steps, systems, and functional modules/units in the devices disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware implementations, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may consist of several physical components. Components execute cooperatively. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读程序指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM)、静态随机存取存储器(SRAM)、闪存或其他存储器技术、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读程序指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。As is known to those of ordinary skill in the art, the term computer storage media includes volatile and non-volatile media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data. lossless, removable and non-removable media. Computer storage media include, but are not limited to, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM), static random access memory (SRAM), flash memory or other memory technology, portable Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassette, magnetic tape, disk storage or other magnetic storage device, or that can be used to store the desired information and can be accessed by a computer any other media. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery medium.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。Computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to various computing/processing devices, or to an external computer or external storage device over a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage on a computer-readable storage medium in the respective computing/processing device .
用于执行本公开操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开的各个方面。 Computer program instructions for performing operations of the present disclosure may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or instructions in one or more programming languages. Source code or object code written in any combination of object-oriented programming languages - such as Smalltalk, C++, etc., and conventional procedural programming languages - such as the "C" language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server implement. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect). In some embodiments, by utilizing state information of computer-readable program instructions to personalize an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), the electronic circuit can Computer readable program instructions are executed to implement various aspects of the disclosure.
这里所描述的计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。The computer program product described here may be implemented specifically through hardware, software, or a combination thereof. In an optional embodiment, the computer program product is embodied as a computer storage medium. In another optional embodiment, the computer program product is embodied as a software product, such as a Software Development Kit (SDK), etc. wait.
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine that, when executed by the processor of the computer or other programmable data processing apparatus, , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executed on a computer, other programmable data processing apparatus, or other equipment to implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more components for implementing the specified logical function(s). Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。 Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims (19)

  1. 一种数据处理方法,其中,包括:A data processing method, including:
    将待处理数据输入目标神经网络,基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果;Input the data to be processed into the target neural network, perform data processing based on the spatial attention mechanism of the squeeze and excitation framework, and obtain the processing results;
    其中,所述空间注意力机制用于从通道维度对特征进行压缩,激励经过通道压缩的特征在空间维度的关联性,获得所述特征在空间维度的注意力信息。Wherein, the spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain the attention information of the features in the spatial dimension.
  2. 根据权利要求1所述的方法,其中,所述将待处理数据输入目标神经网络,基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果,包括:The method according to claim 1, wherein the data to be processed is input into the target neural network, the data is processed based on the spatial attention mechanism of the extrusion and excitation framework, and the processing results are obtained, including:
    根据所述待处理数据确定待处理特征;Determine characteristics to be processed based on the data to be processed;
    从通道维度对所述待处理特征进行特征压缩,获得通道上下文特征;Perform feature compression on the features to be processed from the channel dimension to obtain channel context features;
    对所述通道上下文特征进行特征转换,获得空间注意力特征;Perform feature conversion on the channel context features to obtain spatial attention features;
    对所述待处理特征和所述空间注意力特征进行特征融合,获得目标特征;Perform feature fusion on the features to be processed and the spatial attention features to obtain target features;
    根据所述目标特征,确定处理结果;Determine the processing result according to the target characteristics;
    其中,所述待处理特征、所述通道上下文特征、所述空间注意力特征以及所述目标特征在空间维度的特征尺寸相同,所述待处理特征与所述目标特征在通道维度具有相同的第一特征尺寸,所述通道上下文特征与所述空间注意力特征在通道维度具有相同的第二特征尺寸,且所述第一特征尺寸大于所述第二特征尺寸。Wherein, the feature to be processed, the channel context feature, the spatial attention feature and the target feature have the same feature size in the spatial dimension, and the feature to be processed and the target feature have the same first dimension in the channel dimension. A feature size, the channel context feature and the spatial attention feature have the same second feature size in the channel dimension, and the first feature size is larger than the second feature size.
  3. 根据权利要求2所述的方法,其中,所述从通道维度对所述待处理特征进行特征压缩,获得通道上下文特征,包括:The method according to claim 2, wherein said performing feature compression on the features to be processed from the channel dimension to obtain channel context features includes:
    对所述待处理特征在通道维度进行池化处理,获得所述通道上下文特征;Perform pooling processing on the features to be processed in the channel dimension to obtain the channel context features;
    或,or,
    对所述待处理特征在通道维度进行卷积处理,获得所述通道上下文特征。Convolution processing is performed on the features to be processed in the channel dimension to obtain the channel context features.
  4. 根据权利要求3所述的方法,其中,所述对所述待处理特征在通道维度进行池化处理,获得所述通道上下文特征,包括:The method according to claim 3, wherein said performing pooling processing on the features to be processed in the channel dimension to obtain the channel context features includes:
    对所述待处理特征在通道维度进行全局平均池化处理,获得在通道维度特征尺度为1的所述通道上下文特征;Perform global average pooling processing on the features to be processed in the channel dimension to obtain the channel context features with a feature scale of 1 in the channel dimension;
    所述对所述待处理特征在通道维度进行卷积处理,获得所述通道上下文特征,包括:The step of performing convolution processing on the features to be processed in the channel dimension to obtain the channel context features includes:
    对所述待处理特征在通道维度进行全局卷积,获得在通道维度特征尺度为1的所述通道上下文特征。Perform global convolution on the feature to be processed in the channel dimension to obtain the channel context feature with a feature scale of 1 in the channel dimension.
  5. 根据权利要求2所述的方法,其中,所述对所述通道上下文特征进行特征转换,获得空间注意力特征,包括:The method according to claim 2, wherein said performing feature transformation on said channel context features to obtain spatial attention features includes:
    对所述通道上下文特征在空间维度进行特征提取,获得第一中间特征;Perform feature extraction on the channel context features in the spatial dimension to obtain the first intermediate features;
    对所述第一中间特征进行激活处理,获得第二中间特征;Perform activation processing on the first intermediate feature to obtain a second intermediate feature;
    对所述第二中间特征进行特征还原处理,获得第三中间特征;Perform feature reduction processing on the second intermediate feature to obtain a third intermediate feature;
    对所述第三中间特征进行激活处理,获得所述空间注意力特征。 Perform activation processing on the third intermediate feature to obtain the spatial attention feature.
  6. 根据权利要求5所述的方法,其中,所述对所述通道上下文特征在空间维度进行特征提取,获得第一中间特征,包括:The method according to claim 5, wherein the feature extraction of the channel context features in a spatial dimension to obtain the first intermediate features includes:
    对所述通道上下文特征在空间维度进行第一卷积处理,获得所述第一中间特征;Perform a first convolution process on the channel context feature in the spatial dimension to obtain the first intermediate feature;
    或,or,
    对所述通道上下文特征在空间维度进行第二卷积处理,获得多个通道对应的第四中间特征;确定多个所述第四中间特征在通道维度的平均值,获得所述第一中间特征。Perform a second convolution process on the channel context features in the spatial dimension to obtain fourth intermediate features corresponding to multiple channels; determine the average value of multiple fourth intermediate features in the channel dimension to obtain the first intermediate features .
  7. 根据权利要求6所述的方法,其中,所述第一卷积对应一个卷积核,且所述卷积核的尺寸为3*3,步长为2;The method according to claim 6, wherein the first convolution corresponds to a convolution kernel, and the size of the convolution kernel is 3*3, and the step size is 2;
    所述第二卷积对应四个卷积核,每个卷积核对应一个通道,且所述卷积核的尺寸为7*7,步长为4,膨胀系数为2。The second convolution corresponds to four convolution kernels, each convolution kernel corresponds to one channel, and the size of the convolution kernel is 7*7, the step size is 4, and the expansion coefficient is 2.
  8. 根据权利要求5所述的方法,其中,所述对所述第二中间特征进行特征还原处理,获得第三中间特征,包括:The method of claim 5, wherein performing feature reduction processing on the second intermediate feature to obtain a third intermediate feature includes:
    对所述第二中间特征进行第一反卷积处理,获得所述第三中间特征;Perform a first deconvolution process on the second intermediate feature to obtain the third intermediate feature;
    或,or,
    对所述第二中间特征进行第二反卷积处理,获得多个通道对应的第五中间特征;Perform a second deconvolution process on the second intermediate feature to obtain a fifth intermediate feature corresponding to multiple channels;
    基于多个所述第五中间特征在通道维度的平均值,获得所述第三中间特征。The third intermediate feature is obtained based on an average value of a plurality of the fifth intermediate features in the channel dimension.
  9. 根据权利要求8所述的方法,其中,所述第一反卷积对应一个卷积核,且所述卷积核的尺寸为3*3,步长为2;The method according to claim 8, wherein the first deconvolution corresponds to a convolution kernel, and the size of the convolution kernel is 3*3, and the step size is 2;
    所述第二反卷积对应的四个卷积核,各个所述卷积核对应一个通道,且所述卷积核的尺寸为7*7,步长为4,膨胀系数为2。Each of the four convolution kernels corresponding to the second deconvolution corresponds to a channel, and the size of the convolution kernel is 7*7, the step size is 4, and the expansion coefficient is 2.
  10. 根据权利要求5所述的方法,其中,所述对所述第一中间特征进行激活处理,获得第二中间特征,包括:The method according to claim 5, wherein said activating the first intermediate feature to obtain the second intermediate feature includes:
    基于线性整流函数对所述第一中间特征进行非线性激活,获得所述第二中间特征;Perform nonlinear activation on the first intermediate feature based on a linear rectification function to obtain the second intermediate feature;
    所述对所述第三中间特征进行激活处理,获得所述空间注意力特征,包括:The step of activating the third intermediate feature to obtain the spatial attention feature includes:
    基于S型函数对所述第三中间特征进行非线性归一化激活,获得所述空间注意力特征。Nonlinear normalized activation is performed on the third intermediate feature based on the S-shaped function to obtain the spatial attention feature.
  11. 根据权利要求2所述的方法,其中,所述对所述待处理特征和所述空间注意力特征进行特征融合,获得目标特征,包括:The method according to claim 2, wherein the feature fusion of the features to be processed and the spatial attention features to obtain target features includes:
    将所述待处理特征与所述空间注意力特征逐点相加,获得所述目标特征;Add the features to be processed and the spatial attention features point by point to obtain the target features;
    或,or,
    将所述待处理特征与所述空间注意力特征逐点相乘,获得所述目标特征。The target feature is obtained by multiplying the feature to be processed and the spatial attention feature point by point.
  12. 根据权利要求1所述的方法,其中,所述目标神经网络还包括基于挤压与激励框架的通道注意力机制;The method according to claim 1, wherein the target neural network further includes a channel attention mechanism based on a squeeze and excitation framework;
    所述将待处理数据输入目标神经网络之后,还包括:After inputting the data to be processed into the target neural network, it also includes:
    基于所述通道注意力机制进行数据处理;Perform data processing based on the channel attention mechanism;
    其中,所述通道注意力机制用于从空间维度对特征进行压缩,激励经过空间压缩的 特征在通道维度的关联性,获得所述特征在通道维度的注意力信息。Among them, the channel attention mechanism is used to compress features from the spatial dimension and stimulate spatially compressed features. The correlation of features in the channel dimension is used to obtain the attention information of the features in the channel dimension.
  13. 根据权利要求12所述的方法,其中,所述处理结果是由空间维度的注意力信息和通道维度的注意力信息确定的,所述空间维度的注意力信息基于所述空间注意力机制获得,所述通道维度的注意力信息基于所述通道注意力机制获得。The method according to claim 12, wherein the processing result is determined by the attention information of the spatial dimension and the attention information of the channel dimension, and the attention information of the spatial dimension is obtained based on the spatial attention mechanism, The channel-dimensional attention information is obtained based on the channel attention mechanism.
  14. 根据权利要求1-13任一项所述的方法,其中,所述目标神经网络用于执行图像处理任务、语音处理任务、文本处理任务、视频处理任务中的至少一种。The method according to any one of claims 1 to 13, wherein the target neural network is used to perform at least one of an image processing task, a speech processing task, a text processing task, and a video processing task.
  15. 一种神经网络模型,其中,包括:所述神经网络模型是基于目标神经网络的模型参数构建的模型,A neural network model, which includes: the neural network model is a model constructed based on the model parameters of the target neural network,
    其中,所述目标神经网络采用如权利要求1-14中任一项所述的目标神经网络。Wherein, the target neural network adopts the target neural network as described in any one of claims 1-14.
  16. 一种数据处理装置,其中,包括:A data processing device, which includes:
    数据处理模块,用于将待处理数据输入目标神经网络,基于挤压与激励框架的空间注意力机制进行数据处理,获得处理结果;The data processing module is used to input the data to be processed into the target neural network, perform data processing based on the spatial attention mechanism of the squeeze and excitation framework, and obtain the processing results;
    其中,所述空间注意力机制用于从通道维度对特征进行压缩,激励经过通道压缩的特征在空间维度的关联性,获得所述特征在空间维度的注意力信息。Wherein, the spatial attention mechanism is used to compress features from the channel dimension, stimulate the correlation of the channel-compressed features in the spatial dimension, and obtain the attention information of the features in the spatial dimension.
  17. 一种电子设备,其中,包括:An electronic device, including:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的一个或多个计算机程序,一个或多个所述计算机程序被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-14中任一项所述的数据处理方法。The memory stores one or more computer programs executable by the at least one processor, and the one or more computer programs are executed by the at least one processor to enable the at least one processor to perform e.g. The data processing method according to any one of claims 1-14.
  18. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序在被处理器执行时实现如权利要求1-14中任一项所述的数据处理方法。A computer-readable storage medium on which a computer program is stored, characterized in that, when executed by a processor, the computer program implements the data processing method according to any one of claims 1-14.
  19. 一种计算机程序产品,包括计算机可读代码,或者承载有计算机可读代码的非易失性计算机可读存储介质,其中,当所述计算机可读代码在电子设备的处理器中运行时,所述电子设备中的处理器执行用于实现权利要求1-14中的任一项所述的数据处理方法。 A computer program product comprising computer readable code, or a non-volatile computer readable storage medium carrying the computer readable code, wherein when the computer readable code is executed in a processor of an electronic device, the The processor in the electronic device executes the data processing method described in any one of claims 1-14.
PCT/CN2023/111669 2022-08-09 2023-08-08 Data processing method and apparatus, neural network model, device, and medium WO2024032585A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210948062.X 2022-08-09
CN202210948062.XA CN115034375B (en) 2022-08-09 2022-08-09 Data processing method and device, neural network model, equipment and medium

Publications (1)

Publication Number Publication Date
WO2024032585A1 true WO2024032585A1 (en) 2024-02-15

Family

ID=83130537

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/111669 WO2024032585A1 (en) 2022-08-09 2023-08-08 Data processing method and apparatus, neural network model, device, and medium

Country Status (2)

Country Link
CN (1) CN115034375B (en)
WO (1) WO2024032585A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115034375B (en) * 2022-08-09 2023-06-27 北京灵汐科技有限公司 Data processing method and device, neural network model, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259982A (en) * 2020-02-13 2020-06-09 苏州大学 Premature infant retina image classification method and device based on attention mechanism
CN112215337A (en) * 2020-09-30 2021-01-12 江苏大学 Vehicle trajectory prediction method based on environment attention neural network model
CN114092764A (en) * 2021-11-19 2022-02-25 扬州大学 YOLOv5 neural network vehicle detection method added with attention mechanism
WO2022072659A1 (en) * 2020-10-01 2022-04-07 Beijing Dajia Internet Information Technology Co., Ltd. Video coding with neural network based in-loop filtering
CN114549538A (en) * 2022-02-24 2022-05-27 杭州电子科技大学 Brain tumor medical image segmentation method based on spatial information and characteristic channel
CN115034375A (en) * 2022-08-09 2022-09-09 北京灵汐科技有限公司 Data processing method and device, neural network model, device and medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11403486B2 (en) * 2019-11-13 2022-08-02 Huawei Technologies Co., Ltd. Methods and systems for training convolutional neural network using built-in attention
CN111310764B (en) * 2020-01-20 2024-03-26 上海商汤智能科技有限公司 Network training method, image processing device, electronic equipment and storage medium
CN111274999B (en) * 2020-02-17 2024-04-19 北京迈格威科技有限公司 Data processing method, image processing device and electronic equipment
US20230290134A1 (en) * 2020-09-25 2023-09-14 Intel Corporation Method and system of multiple facial attributes recognition using highly efficient neural networks
CN113111970B (en) * 2021-04-30 2023-12-26 陕西师范大学 Method for classifying images by constructing global embedded attention residual network
CN114202502A (en) * 2021-08-30 2022-03-18 浙大宁波理工学院 Thread turning classification method based on convolutional neural network
CN114359164A (en) * 2021-12-10 2022-04-15 中国科学院深圳先进技术研究院 Method and system for automatically predicting Alzheimer disease based on deep learning
CN114842185A (en) * 2022-03-21 2022-08-02 昭通亮风台信息科技有限公司 Method, device, equipment and medium for identifying fire
CN114782737A (en) * 2022-03-24 2022-07-22 福建亿榕信息技术有限公司 Image classification method, device and storage medium based on improved residual error network
CN114781513A (en) * 2022-04-22 2022-07-22 北京灵汐科技有限公司 Data processing method and device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259982A (en) * 2020-02-13 2020-06-09 苏州大学 Premature infant retina image classification method and device based on attention mechanism
CN112215337A (en) * 2020-09-30 2021-01-12 江苏大学 Vehicle trajectory prediction method based on environment attention neural network model
WO2022072659A1 (en) * 2020-10-01 2022-04-07 Beijing Dajia Internet Information Technology Co., Ltd. Video coding with neural network based in-loop filtering
CN114092764A (en) * 2021-11-19 2022-02-25 扬州大学 YOLOv5 neural network vehicle detection method added with attention mechanism
CN114549538A (en) * 2022-02-24 2022-05-27 杭州电子科技大学 Brain tumor medical image segmentation method based on spatial information and characteristic channel
CN115034375A (en) * 2022-08-09 2022-09-09 北京灵汐科技有限公司 Data processing method and device, neural network model, device and medium

Also Published As

Publication number Publication date
CN115034375B (en) 2023-06-27
CN115034375A (en) 2022-09-09

Similar Documents

Publication Publication Date Title
US10713818B1 (en) Image compression with recurrent neural networks
US11551068B2 (en) Processing system and method for binary weight convolutional neural network
CN109658455B (en) Image processing method and processing apparatus
WO2020233130A1 (en) Deep neural network compression method and related device
WO2022227913A1 (en) Double-feature fusion semantic segmentation system and method based on internet of things perception
US11574239B2 (en) Outlier quantization for training and inference
WO2019238029A1 (en) Convolutional neural network system, and method for quantifying convolutional neural network
WO2024032585A1 (en) Data processing method and apparatus, neural network model, device, and medium
CN112990219B (en) Method and device for image semantic segmentation
CN109902274A (en) A kind of method and system converting json character string to thrift binary stream
US11714921B2 (en) Image processing method with ash code on local feature vectors, image processing device and storage medium
WO2023231794A1 (en) Neural network parameter quantification method and apparatus
WO2023202695A1 (en) Data processing method and apparatus, device, and medium
WO2022028197A1 (en) Image processing method and device thereof
WO2023077809A1 (en) Neural network training method, electronic device, and computer storage medium
US20220343512A1 (en) Method and apparatus of processing image, electronic device, and storage medium
CN110428422B (en) Super-pixel sampling network
CN109086819B (en) Method, system, equipment and medium for compressing caffemul model
CN115631330B (en) Feature extraction method, model training method, image recognition method and application
US20230196086A1 (en) Increased precision neural processing element
WO2021081854A1 (en) Convolution operation circuit and convolution operation method
WO2024045320A1 (en) Facial recognition method and apparatus
CN114239760B (en) Multi-modal model training and image recognition method and device, and electronic equipment
CN112418388A (en) Method and device for realizing deep convolutional neural network processing
CN113139490B (en) Image feature matching method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23851806

Country of ref document: EP

Kind code of ref document: A1