CN116033159A

CN116033159A - Feature processing method, image coding method and device

Info

Publication number: CN116033159A
Application number: CN202211575266.XA
Authority: CN
Inventors: 粘春湄; 戴亮; 江东; 林聚财; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-04-28

Abstract

The application discloses a feature processing method, an image coding method and a device. The feature processing method comprises the following steps: dividing the features into a plurality of sub-features according to the channels based on the information distribution situation among the channels in the features; carrying out convolution processing on each sub-feature; and splicing convolution processing results of all the sub-features to obtain the target feature. The present application may enhance the effects of image processing, video processing, and/or audio processing.

Description

Feature processing method, image coding method and device

Technical Field

The present invention relates to the field of image encoding and decoding technologies, and in particular, to a feature processing method, an image encoding method, and an image encoding device.

Background

Convolutional neural networks are a type of neural network, which have been widely used and developed in the fields of Computer Vision (CV) and image processing because they exhibit excellent feature extraction capability, which exhibit excellent performance in image processing tasks such as image segmentation, object detection, and image classification, and which also exhibit excellent performance in video processing tasks or audio processing tasks. However, during long-term development, the inventors of the present application have found that the present convolution method has insufficient feature expression effects, so that image processing, video processing and/or audio processing effects are poor.

Disclosure of Invention

The application provides a feature processing method, an image coding method and an image coding device, which can improve the feature expression effect of convolution processing so as to improve the effect of image processing, video processing and/or audio processing by utilizing the convolution result.

To achieve the above object, the present application provides a feature processing method, including:

dividing the features into a plurality of sub-features according to the channels based on the information distribution situation among the channels in the features;

carrying out convolution processing on each sub-feature;

and splicing convolution processing results of all the sub-features to obtain target features of the features.

To achieve the above object, the present application further provides an image encoding method, including:

processing the image to obtain a first characteristic of the image;

based on the characteristic processing method, carrying out grouping convolution on the first characteristic;

the image is encoded based on the target feature of the first feature.

To achieve the above object, the present application also provides an encoder including a processor; the processor is configured to execute instructions to implement the steps of the above-described method.

To achieve the above object, the present application also provides a computer readable storage medium storing instructions/program data capable of being executed to implement the above method.

According to the feature processing method, based on the information distribution situation among channels in the features, the features are divided into a plurality of sub-features according to the channels, and a plurality of sub-features are obtained; the method comprises the steps of respectively convolving a plurality of sub-features, and splicing convolution results of the plurality of sub-features to obtain target features of the features; the method and the device have the advantages that the characteristics of the information quantity of different sub-characteristics in the characteristics are balanced on the basis of the information distribution conditions among the channels, so that the convolution modules corresponding to each sub-characteristic can learn the image characteristics, the information in the image characteristics can be extracted by the convolution modules, the characteristics can be fully learned by the convolution parameters in the convolution model, the characteristic expression effect of the image characteristics can be improved, and the effects of image processing, video processing and/or audio processing can be improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a schematic flow chart of an embodiment of a feature processing method of the present application;

FIG. 2 is a schematic diagram of one embodiment of a feature processing method of the present application;

FIG. 3 is a schematic diagram of another embodiment of the feature processing method of the present application;

FIG. 4 is a schematic diagram of yet another embodiment of the feature processing method of the present application;

FIG. 5 is a schematic diagram of one embodiment of sub-feature reordering in the feature handling method of the present application;

FIG. 6 is a schematic diagram of one embodiment of sub-feature channel shuffling in the feature processing method of the present application;

FIG. 7 is a flow chart of an image encoding method;

FIG. 8 is a flow chart of another image encoding method;

FIG. 9 is a flow chart of an embodiment of an image encoding method of the present application;

FIG. 10 is a schematic diagram illustrating an image encoding network according to an embodiment of the present application;

FIG. 11 is a schematic diagram of an embodiment of an encoder of the present application;

FIG. 12 is a schematic diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure. In addition, the term "or" as used herein refers to a non-exclusive "or" (i.e., "and/or") unless otherwise indicated (e.g., "or otherwise" or in the alternative "). Also, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments may be combined with one or more other embodiments to form new embodiments.

In the related art, by means of uniform channel group convolution, features of m×n total number of channels and m×n×h×w dimensions (H, W are respectively wide and high) are uniformly divided into N parts, each part of features has M channels, that is, m×h×w dimensions (H, W are respectively wide and high) of each part of features, and each of the N parts of features corresponds to one set of convolution parameters, so that a plurality of sets of convolution parameter learning features can efficiently represent the features.

However, some channels of the feature have efficient channel information, some channels have inefficient channel information, that is, communication information distribution has non-uniformity, so that the uniform channel grouping convolution method does not consider the non-uniformity of the feature channel information distribution, and thus the effect of feature expression of the uniform channel grouping convolution method is insufficient.

Based on the information, the application provides a feature processing method, which divides the features into a plurality of sub-features according to channels based on the information distribution situation among the channels in the features of the images; and respectively convolving the plurality of sub-features, and splicing the convolved results of the plurality of sub-features to obtain the target features of the features, so that the information distribution situation among channels in the features is utilized when the features are grouped, namely the non-uniformity of the information distribution of the channels is fully utilized when the features are convolved in a grouping way.

Specifically, as shown in fig. 1, a feature processing method proposed in the present application may include the following steps. It should be noted that the following step numbers are only for simplifying the description, and are not intended to limit the execution order of the steps, and the execution order of the steps of the present embodiment may be arbitrarily changed without departing from the technical idea of the present application.

S101: based on the information distribution situation among the channels in the characteristics, the characteristics are divided into a plurality of sub-characteristics according to the channels.

The method can divide the characteristic into a plurality of sub-characteristics according to the channel based on the information distribution condition among the channels in the characteristic, so that the plurality of sub-characteristics are convolved respectively, and the convolution results of the plurality of sub-characteristics are spliced to obtain the target characteristic of the characteristic.

The above-described features may be extracted based on at least one of image, video, and audio information.

In the case where the feature is extracted based on image information, the feature may be a limb feature, a skeleton feature, a contour feature, a color feature, or the like of an object such as a person.

In the case where the feature is extracted based on video information, the feature may be a gait feature of an object such as a person.

Alternatively, the inter-channel information distribution condition refers to a distribution condition of the at least one information between different channels in the feature, that is, the inter-channel information distribution condition may refer to a distribution condition of an information amount of the at least one information between different channels in the feature. Taking image features as an example, the amount of image information for at least one channel in a feature may refer to the proportion of the restored image in accordance with the at least one channel. For example, if the number of channels of the feature is 4, the first channel based on the feature can restore 40% of the image, the second channel based on the feature can restore 30% of the image, the third channel based on the feature can restore 20% of the image, and the fourth channel based on the feature can restore 10% of the image, the amount of image information in the four channels of the feature is 40%,30%,20%,10% in this order, and the inter-channel information distribution in the feature is (40%, 30%,20%, 10%).

In this embodiment, based on the information distribution situation between channels in the features, the features are divided into a plurality of sub-features according to the channels to equalize the information amounts of different sub-features, so that the convolution module corresponding to each sub-feature can learn the features, thereby being beneficial to each convolution module to extract the information in the features, so that the convolution parameters in the convolution model of the score can fully learn the features, thereby improving the feature expression effect of the features, and further improving the processing effect of images, audios or videos. Illustratively, assuming that the image feature is divided into four sub-features according to step S101, the information amounts of the different sub-features may be equalized such that the difference between the image information amounts in the four sub-features and 25% respectively is within the difference threshold. The difference threshold may be set according to practical situations, and is not limited herein, and may be, for example, 1%, 5%, or 7%.

In one implementation, the engineer may manually determine the information distribution situation between the channels in the feature according to the domain knowledge, and the engineer may divide the feature according to the determined information distribution situation between the channels in the feature.

Illustratively, assuming that engineers analyze the features, efficient channels (i.e., channels with image information amounts above a threshold) are mostly concentrated in the first few channels of the features, while inefficient channels (i.e., channels with image information amounts below a threshold) are mostly concentrated in the last few channels of the features; the range of grouping can be gradually enlarged from front to back; as shown in fig. 2, the features are separated by channels, M1, M2, … …, mn are the number of channels per sub-feature, M1< M2< … … < mn, and it satisfies Σ (M1, M2, … …, mn) =n×m, in the order from top to bottom, i.e., front to back of the channels. The threshold may be set according to actual situations, and is not limited herein.

As can be seen from the above examples, it is preferable that the features be divided according to the degree of efficient channel distribution concentration. Wherein the plurality of sub-features includes a first sub-feature and a second sub-feature; the number of channels of the first sub-feature is less than the number of channels of the second sub-feature; and compared with the second sub-feature, the first sub-feature comprises more efficient channels, so that the image information quantity among different sub-features can be balanced, each set of convolution parameters can learn the effective information of the feature, and the convolution parameters in the packet convolution model can fully learn the feature, so that the packet convolution feature can efficiently represent the feature.

In other words, the number of channels of the sub-feature is inversely related to the degree of efficient channel distribution concentration of the sub-feature; i.e., the more and more concentrated the efficient channels in a sub-feature, the fewer the number of channels in the sub-feature; i.e., the fewer, less concentrated, the more efficient channels in a sub-feature, the greater the number of channels in the sub-feature.

In another implementation manner, the feature may be processed by using a deep learning method to determine the number of channels of each of the plurality of sub-features; the feature is divided into a plurality of sub-features according to the number of channels of each of the plurality of sub-features. It will be appreciated that the deep learning method is also based on the number of channels per sub-feature determined by the inter-channel information distribution in the feature.

Optionally, in this implementable manner, processing the feature using a deep learning method, determining the number of channels of each of the plurality of sub-features may include: carrying out convolution processing on the features to obtain first intermediate features, wherein the number of channels of the first intermediate features is equal to the total number of sub-features; determining a probability for each of the plurality of sub-features based on the first intermediate feature; multiplying the probability of each sub-feature by the total number of channels of the feature to obtain the number of channels of each sub-feature.

Wherein determining the probability of each of the plurality of sub-features based on the first intermediate feature may include: downsampling the first intermediate feature to obtain a first feature vector of G1*1, where G is equal to the total number of sub-features; and carrying out normalization processing on the characteristic values in the first characteristic vector to obtain the probability of each sub-characteristic in the plurality of sub-characteristics. Wherein the first intermediate feature may be downsampled by pooling or convolution, etc.

For example, as shown in fig. 3, the total number G of sub-features to be grouped may be preset, after the features are calculated by the conv_g convolution layer, a first intermediate feature of gxh×w is output, then the probability of each sub-feature in G is obtained based on the first intermediate feature, and then the number of channels to be allocated for each sub-feature is obtained from the total number of channels of the feature and the corresponding probability. The probability may be obtained by downsampling by a pooling layer (pooling) and then normalizing to obtain the probability P, where P1 to Pg satisfy the condition of sum 1.

In yet another implementation manner, a channel number basic value of each of the plurality of sub-features may be preset; processing the characteristics by using a deep learning method to determine the channel quantity offset value of each of the plurality of sub-characteristics; adding the channel number basic value and the channel number offset value of each sub-feature to obtain the channel number of each sub-feature; the feature is divided into the plurality of sub-features according to the number of channels of each of the plurality of sub-features.

In an example, for different features, the channel number base value for each of the plurality of sub-features may be the same with the number of packets being the same. Thus, the channel number basic value of each sub-feature obtained by grouping under the grouping number can be directly set in the grouping convolution module.

In another example, for different features, the channel number base value for each of the plurality of sub-features may be different with the same number of packets.

Optionally, processing the feature by using a deep learning method to determine a channel number offset value of each of the plurality of sub-features may include: convolving the features to obtain second intermediate features, wherein the number of channels of the second intermediate features is equal to the total number of the sub-features; determining an offset probability for each of the plurality of sub-features based on the second intermediate feature; and multiplying the offset probability and the offset of each sub-feature to obtain the channel number offset value of each sub-feature.

Optionally, determining the offset probability of each sub-feature of the plurality of sub-features based on the second intermediate feature includes: downsampling the second intermediate feature to obtain a second feature vector of G1*1, where G is equal to the total number of sub-features; and processing the characteristic value in the second characteristic vector through a tanh activation function to obtain the offset probability of each sub-characteristic in the plurality of sub-characteristics. Wherein the second intermediate feature may be downsampled by pooling or convolution, etc.

The range of the offset probability can be between (-1, 1) by the tanh activation function.

Alternatively, the sum of the shift probabilities of all the sub-features may be equal to 0, so that in the case where the sum of the channel number base values of all the sub-features is equal to the total number of channels of the feature, the sum of the channel numbers of all the sub-features determined based on the sub-feature shift probabilities is equal to the total number of channels of the feature, thereby completely achieving the purpose of dividing the feature by channels. Of course, in another embodiment, the sum of the offset probabilities of all sub-features may not be equal to 0; the purpose of dividing the characteristics by the channels based on the channel number of each of the plurality of sub-characteristics determined by the implementation mode can be realized as long as the sum of the offset probabilities of all the sub-characteristics and the product of the offset, the value obtained by adding the sum of the channel number basic values of all the sub-characteristics, and the total number of the channels of the characteristics are equal.

The Bias amount Bias may be preset, and the maximum range thereof may be between (-Bias, bias). For example, 5 or 8.

In a specific example, as shown in fig. 4, assume that the total number of sub-features g=8 that need to be grouped is preset, and the dimension of the feature is 192xHxW (where 192 is the total number of channels of the input feature); presetting the channel number basic values of the 8 sub-features to be 12, 16, 20, 26, 30, 34 and 34 respectively; the offset is 5, the characteristic is processed by a deep learning method, and the determined channel number offset values of the 8 sub-characteristics are respectively-3, -2,1, +1, +3, -1,0 and +3; the number of channels of each of the 8 sub-features thus determined based on the above-described channel number base value and channel number offset value is 9, 14, 21, 21, 29, 29, 34, 37; then, the characteristics can be divided according to the number of channels of each of the 8 sub-characteristics; the 8 sub-features are then convolved and spliced according to the channel to output features.

The branches corresponding to the deep learning method can be trained, so that the values obtained based on the deep learning method can divide the features according to the efficient channel distribution concentration degree, so that the image information quantity among different sub-features can be balanced, and each set of convolution parameters can learn the image information.

Among the above-described various realizable modes, the number of packets of the feature may be set in advance, and then the feature may be divided based on the above-described various realizable modes and the set number of packets.

The number of packets may be set according to the actual situation, and is not limited herein. For example, 5, 8 or 10.

Alternatively, the number of channels separating the feature into at least some of the plurality of sub-features based on the inter-channel information distribution in the feature may be different, i.e., the feature may be non-uniformly grouped based on the inter-channel information distribution in the feature. In many cases, the number of channels of the nth sub-feature is less than or equal to the number of channels of the n+1th sub-feature, among the plurality of sub-features divided by the feature based on the inter-channel information distribution in the feature. Wherein the ranking of the sub-features is determined according to the channel ranking in the features. For example, assume that a feature is divided into 8 sub-features, the number of channels of the first sub-feature < the number of channels of the second sub-feature < the number of channels of the third sub-feature < the number of channels of the fourth sub-feature < the number of channels of the fifth sub-feature < the number of channels of the sixth sub-feature < the number of channels of the seventh sub-feature < the number of channels of the eighth sub-feature. For another example, assume that a feature is divided into 6 sub-features, the number of channels of the first sub-feature = the number of channels of the second sub-feature = the number of channels of the third sub-feature < the number of channels of the fourth sub-feature = the number of channels of the fifth sub-feature = the number of channels of the sixth sub-feature. Of course, it is not excluded that in some cases, the number of channels of the nth sub-feature is larger than the number of channels of the n+1th sub-feature, among the plurality of sub-features into which the feature is divided based on the inter-channel information distribution in the feature.

In other embodiments, the number of channels separating the feature into multiple sub-features based on the inter-channel information distribution in the feature is exactly the same, i.e., the features may be evenly grouped based on the inter-channel information distribution in the feature.

S102: and carrying out convolution processing on each sub-feature.

After dividing the feature into a plurality of sub-features according to channels based on step S101, convolution processing may be performed on each sub-feature to obtain a convolution processing result of each sub-feature.

S103: and splicing convolution processing results of all the sub-features to obtain target features of the features.

Alternatively, the convolution processing results of all the sub-features may be spliced by channels. Alternatively, the convolution processing results of all the sub-features may be spliced in the width direction and/or the height direction.

Wherein the concatenation order of the convolution processing results of all the sub-features may be identical to the order of the plurality of sub-features in the feature.

In another possible implementation, the concatenation order of the convolution processing results of all the sub-features may also be different from the order of the plurality of sub-features in the feature.

Optionally, the convolution processing results of all sub-features may be reordered prior to the concatenation of the convolution processing results; the convolution processing results of all the sub-features can be spliced according to the new sequence.

In one example, the indexes of all sub-features may be randomly shuffled, resulting in a new ordering of all sub-features; and splicing the convolution processing results of all the sub-features according to the new sequence.

In another example, as shown in fig. 5, a first number of channels in the convolution processing result of all sub-features/sub-features are combined to obtain a first combined feature; processing the first combined feature to obtain a third feature vector of G1*1, wherein G is the total number of sub-features (namely the grouping number of the features), and feature values in the third feature vector are in one-to-one correspondence with the sub-features; re-ordering the feature values in the third feature vector according to the size, wherein the new ordering of the feature values is the new ordering of the sub-features corresponding to the feature values, and thus, the new ordering of all the sub-features is determined; and splicing the convolution processing results of all the sub-features according to the new sequence.

The manner of processing the first combination feature is not limited, and may be specifically set according to actual situations. For example, the first combined feature may be processed in a pooling or convolution, or the like.

The first number mentioned above is also not limited and may be, for example, 1, 2 or 4. The location of the selected first number of channels in the sub-feature is also not limited, and may be, for example, a first front number of channels in the sub-feature or a first rear number of channels in the sub-feature. And the locations of the first number of channels in the different sub-features may be the same or different.

Further, reordering feature values in the third feature vector by size may refer to: sorting the feature values in the third feature vector from large to small; alternatively, the eigenvalues in the third eigenvector are ordered from small to large.

In yet another example, the indexes of all sub-features are matrixed to obtain an index matrix; performing transposition on the index matrix to obtain a transposed matrix; expanding the transposed matrix into one dimension to obtain new ordering of all sub-features; and splicing the convolution processing results of all the sub-features according to the new sequence.

For example, if a feature is split uniformly into 8 sets of sub-features per channel, the indices are 1, 2, 3, 4, 5, 6, 7, 8, respectively, which make up the index matrix as follows:

the transposed matrix obtained by the transposed transformation is as follows:

then, the transpose matrix is transformed into one dimension, the index sequence is changed into 1, 4, 7, 2, 5, 8, 3 and 6, and the convolution processing results of all the sub-features are spliced according to the new sequence of the indexes.

In addition, all sub-features may also be reordered prior to step S102; then, performing convolution processing on each sub-feature by utilizing the step S102; and then splicing convolution processing results of all the sub-features according to the new sequence to obtain target features of the features.

In the embodiment, the feature processing method divides the feature into a plurality of sub-features according to the channel based on the information distribution condition among the channels in the feature, so as to obtain a plurality of sub-features; the method comprises the steps of respectively convolving a plurality of sub-features, and splicing convolution results of the plurality of sub-features to obtain target features of the features; the feature of the information quantity of different sub-features in the feature is balanced through division, so that the convolution module corresponding to each sub-feature can learn the image feature, the information in the image feature can be extracted by the convolution module, the feature can be fully learned by the convolution parameters in the convolution model, the feature expression effect of the image feature can be improved, and the effects of image processing, video processing and/or audio processing can be improved.

For example, assuming the original convolution calculation, whose input feature size is c×h×w, convolution kernel size is c×0kx1k, and output feature size is n×2h '×3w', its total reference number is c×h×w×c×k×n×h '×w'. After dividing the features into G groups by the feature processing method, the features are input into a feature rulerCun is cun

The convolution kernel size is +.>

Output feature size is +.>

Its total parameter is +. >

In this way, the total model parameters after grouping are reduced to +.>

In addition, after step S102, channel shuffling may also be performed on the convolution processing results of at least part of the sub-features; then splicing the convolution processing results subjected to channel shuffling; the sub-features after grouping are shuffled by group channel shuffling such that there is still correlation between the channels after grouping.

Alternatively, the convolution processing results of at least some of the sub-features may be channel shuffled using a variety of channel shuffling methods, such as the various embodiments shown below.

In one embodiment, channel shuffling may be performed by a fixed crossover approach, where a second number of channels at fixed locations are selected in G1 to be swapped with a second number of channels at fixed locations in G2; selecting a third number of channels in the fixed position in G2, exchanging with the third number of channels in the fixed position in G3, and so on. Wherein the second number and the third number may be equal or unequal. The second number/third number may be set according to the actual situation, and is not limited herein, and may be 1, 2, or 4, for example.

In another embodiment, the convolution processing results of at least some of the sub-features may be channel shuffled in a random manner. Specifically, the number of exchanges may be preset; and carrying out random exchange for times based on the convolution processing result of at least part of the sub-features. In each random exchange process, two sub-features are randomly extracted, and channels to be exchanged in the two sub-features are randomly extracted and exchanged. In this embodiment, the number of channels to be exchanged and the number of times of exchange may be preset, and a random extraction mode is adopted to extract two groups each time, and then randomly extract two groups of channels to be exchanged for exchange, and the operation is circulated until the number of times of operation reaches the preset number of times of exchange, and then the operation is stopped.

In yet another embodiment, it may be determined whether channel shuffling of two sub-features is required based on the correlation between the two sub-features. Wherein if the correlation between two sub-features is too low (which may be below a correlation threshold, or ranked later in all correlations), then the two sub-features are not channel shuffled; if the correlation between the two sub-features is strong, then channel shuffling can be performed on the two sub-features.

In this embodiment, at least one pair of sub-features is selected from all sub-features, wherein the selected at least one pair of sub-features has a higher correlation than all sub-features that are not selected. Wherein a pair of sub-features includes two sub-features.

In an example, the correlation of each pair of features may be calculated using a spatial distance calculation method or a correlation calculation model, or the like; at least one pair of sub-features is selected from among all of the sub-features based on the correlation of all of the sub-features.

In another example, as shown in fig. 6, a fourth number of channels in the convolution processing results of all sub-features/sub-features may be combined to obtain a second combined feature; processing the second combined feature to obtain a fourth feature vector of G1*1, wherein G is the total number of sub-features (namely the grouping number of the features), and feature values in the fourth feature vector are in one-to-one correspondence with the sub-features, wherein the proximity degree of the feature values corresponding to each pair of the sub-features represents the correlation of each pair of the sub-features; at least one pair of sub-features is selected from all sub-features based on the fourth feature vector, wherein the selected at least one pair of sub-features has a higher correlation than all sub-features that are not selected.

For example, the features of the first channel in each sub-feature may be combined to obtain a second combined feature of GxHxW; pooling the second combined features, determining feature values corresponding to all sub-features based on the pooling result, reordering the feature values corresponding to all the sub-features from big to small, grouping the feature values into pairs according to the correlation among the feature values, selecting the L pair with the strongest correlation, and recording idx to obtain an idx list, wherein each pair of sub-features in the idx list is the pair of sub-features with stronger correlation.

After at least one pair of sub-features is selected, each pair of sub-features may be channel shuffled.

Wherein channels for preset locations in each pair of sub-features may be swapped.

Alternatively, the channels to be exchanged in each of the two sub-features of each pair of sub-features may be randomly extracted and exchanged.

Or, determining a channel to be exchanged of each sub-feature by a deep learning method; during the channel shuffling process of a pair of sub-features, channels to be exchanged of two sub-features of the pair of sub-features are exchanged.

Wherein, as shown in fig. 6, determining the channel to be exchanged of each sub-feature by the deep learning method may include: convolving each sub-feature or the convolution processing result of each sub-feature to obtain a fifth feature vector of each sub-feature, wherein the number of feature values in the fifth feature vector of each sub-feature is the number of channels which need to be exchanged once for each sub-feature; after normalizing the fifth feature vector of each sub-feature, multiplying the internal feature value by the total number of channels of each sub-feature to obtain a sixth feature vector of each sub-feature; and after the processing such as downward rounding, upward rounding or rounding is carried out on each feature value in the sixth feature vector of each sub-feature, obtaining a channel index which needs to be exchanged once for each sub-feature, and determining a channel to be exchanged of each sub-feature through a deep learning method.

In addition, when the feature is an image feature, after the target feature of the image is obtained based on the method, the processing result of the image can be obtained based on the target feature. Alternatively, the results of the processing of segmentation, detection, identification, classification, decoding and/or encoding of the image may be obtained based on the target features, and is not limited in particular herein.

The above feature processing method may be an image encoding method.

As shown in fig. 7, the image encoding method may include the steps of transformation and inverse transformation, quantization and inverse quantization, entropy encoding, and/or entropy decoding.

The transformation mainly adopts a convolutional neural network to carry out nonlinear downsampling, and has the effects of expressing main characteristics of an original image by using a more compact expression and reducing the dimension and the data volume of the image. The inverse transformation is to recover the original image from the compact representation.

Quantization is one of the links of lossy coding, and is used for shaping data and improving the compression rate. The inverse quantization (optional) is the opposite operation, but may not be done, as the strong non-linear capability through the neural network may involve the effect of the inverse quantization.

Entropy coding is a lossless process, and the probability of a sign bit in each feature is calculated mainly through a constructed probability model, and is coded into a binary representation to be written into a code stream. Whereas entropy decoding is the inverse of entropy encoding.

The above image encoding method may be performed by a model. For example, the above-described image encoding method may be performed using an end-to-end image codec that is entirely composed of a neural network. Wherein, as shown in fig. 6, the codec model may include a main coding network and an entropy model network.

The primary encoding network may include both varying and inverse transform models.

In the transformation/inverse transformation network, a non-local attention module can be used, the element value of each feature is equal to the weighted sum of the original value and the weight of the whole feature position, and then the dimension is continuously reduced to reduce the data volume during transformation. The inverse transformation network is a continuous dimension-increasing network, and the original data volume is restored.

The entropy model network may include auxiliary transforms, auxiliary inverse transforms, quantization, inverse quantization, entropy encoding and/or entropy decoding, and constructing probability models. Wherein, the functions of the other modules are similar to the functions of the corresponding modules in the main coding network except for the construction of the probability model. The probability model is built mainly by learning model parameters through a neural network and is used for calculating the probability of the to-be-coded characteristics of the main coding network.

In the auxiliary transformation/auxiliary inverse transformation, the characteristics are uniformly divided according to channels in a grouping convolution mode, then grouping is carried out according to fixed quantity, each group is convolved, and finally the groups are combined.

In some techniques, as shown in fig. 8, the entropy model network may also include a prediction module and a context module.

The prediction module can be used for differential coding so as to send the output of the auxiliary inverse transformation into the prediction module, and then the output and the quantized characteristics of the transformation are subjected to differential, and the differential is sent to coding. Alternatively, a uniform division of channels may also be performed in the prediction module.

And since each sample to be encoded in the feature, the context model can be used to learn its correlation, depending on the previous encoded sample, reducing redundancy.

Referring to fig. 9, fig. 9 is a flowchart illustrating an embodiment of an image encoding method. It should be noted that, if there are substantially the same results, the present embodiment is not limited to the flow sequence shown in fig. 9. In this embodiment, the image encoding method includes the steps of:

s201: and obtaining the target feature of the image based on any feature processing method.

Wherein, step S201 may include: processing the image to obtain a first characteristic of the image; and processing the first feature based on any one of the feature processing methods to obtain the target feature.

In the step of processing the image to obtain the first feature of the image, a group shuffling method and/or a channel shuffling method may be applied.

Wherein, the group shuffling method may comprise: grouping the second features of the image to obtain at least two sub-features of the second features; reordering all sub-features of the second feature; and splicing all the sub-features of the second feature or the processing results of all the sub-features according to the new sequence to obtain a group shuffling result of the second feature. The first feature of the image may be a set of shuffled results of the second feature or a feature resulting from processing the set of shuffled results of the second feature. In the specific implementation method for reordering all the sub-features of the second feature, reference may be made to the method for reordering all the sub-features of the feature described in step S103.

The channel shuffling method may include: grouping the fourth features of the image to obtain at least two sub-features of the fourth features; processing all sub-features of the fourth feature respectively; channel shuffling the processing results of at least part of the sub-features of the fourth feature; and splicing the latest processing results of all the sub-features of the fourth feature to obtain a channel shuffling result of the fourth feature. The first feature of the image may be a channel shuffling result of the fourth feature or a feature obtained by processing the channel shuffling result of the fourth feature. The specific implementation method of channel shuffling on the processing results of at least part of the sub-features of the fourth feature may refer to the above method of channel shuffling on the processing results of all the sub-features of the feature.

The method is particularly not limited, and the method can be applied to the steps of transformation, inverse transformation, auxiliary inverse transformation, prediction in difference coding, context processing, entropy coding, entropy decoding and/or the like of a coding method.

That is, the module corresponding to the feature processing method of the present application may be applied to the transformation/inverse transformation network or the auxiliary transformation/auxiliary inverse transformation network in fig. 10, the prediction module in the difference coding, the context module, and/or the entropy coding/decoding.

S202: the image is encoded based on the target feature.

In step S202, a group shuffling method and/or a channel shuffling method may also be applied.

Wherein, the group shuffling method may comprise: grouping the third features of the image to obtain at least two sub-features of the third features; reordering all sub-features of the third feature; and splicing all the sub-features of the third feature or the processing results of all the sub-features according to the new sequence to obtain a group shuffling result of the third feature. The third feature of the image may be a target feature or a feature resulting from processing the target feature of the first feature. In the specific implementation method for reordering all the sub-features of the third feature, reference may be made to the method for reordering all the sub-features of the feature described in step S103.

The channel shuffling method may include: grouping the fifth features of the image to obtain at least two sub-features of the fifth features; processing all sub-features of the fifth feature respectively; channel shuffling the processing results of at least part of the sub-features of the fifth feature; and splicing the latest processing results of all the sub-features of the fifth feature to obtain a channel shuffling result of the fifth feature. The fifth feature of the image may be a target feature or a feature obtained by processing the target feature of the first feature. The fifth feature of the image may be a target feature or a feature obtained by processing the target feature of the first feature. The specific implementation method of channel shuffling on the processing results of at least part of the sub-features of the fifth feature may refer to the above method of channel shuffling on the processing results of all the sub-features of the feature.

The above-described group shuffling method and/or channel shuffling method may also be applied to the steps of transformation, inverse transformation, auxiliary inverse transformation, prediction in difference coding, context processing, entropy coding and/or entropy decoding of an image coding method, and is not particularly limited.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of an encoder of the present application. The present encoder 10 includes a processor 12, the processor 12 being configured to execute instructions to implement the prediction method and the image encoding method described above. The specific implementation process is described in the above embodiments, and will not be described herein.

The processor 12 may also be referred to as a CPU (Central Processing Unit ). The processor 12 may be an integrated circuit chip having signal processing capabilities. Processor 12 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 12 may be any conventional processor or the like.

Encoder 10 may further include a memory 11 for storing instructions and data necessary for processor 12 to operate.

The processor 12 is configured to execute instructions to implement the methods provided by any of the embodiments of the prediction method and the image encoding method of the present application and any non-conflicting combinations described above.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a computer readable storage medium according to an embodiment of the present application. The computer-readable storage medium 30 of the present embodiment stores instruction/program data 31, which when executed, implements the methods provided by any of the embodiments of the feature processing method and the image encoding method of the present application, as well as any non-conflicting combination. Wherein the instructions/program data 31 may be stored in the storage medium 30 as a software product to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium 30 includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or a terminal device such as a computer, a server, a mobile phone, a tablet, or the like.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

The foregoing is only the embodiments of the present application, and not the patent scope of the present application is limited by the foregoing description, but all equivalent structures or equivalent processes using the contents of the present application and the accompanying drawings, or directly or indirectly applied to other related technical fields, which are included in the patent protection scope of the present application.

Claims

1. A method of feature processing, the method comprising:

dividing the features into a plurality of sub-features according to channels based on the information distribution situation among the channels in the features, wherein the features are extracted based on at least one of the information in the images, the videos and the audios;

carrying out convolution processing on each sub-feature;

and splicing convolution processing results of all the sub-features to obtain the target feature.

2. The feature processing method of claim 1, wherein the plurality of sub-features includes a first sub-feature and a second sub-feature;

the number of channels of the first sub-feature is less than the number of channels of the second sub-feature;

wherein the first sub-feature comprises a greater number of channels having an image information level above a threshold than the second sub-feature.

3. The feature processing method according to claim 1, wherein the dividing the feature into a plurality of sub-features by channels based on the inter-channel information distribution in the feature comprises:

Processing the characteristics by using a deep learning method, and determining the number of channels of each of the plurality of sub-characteristics;

the feature is divided into the plurality of sub-features according to the number of channels of each of the plurality of sub-features.

4. The feature processing method according to claim 3, wherein the processing the features using a deep learning method to determine the number of channels of each of the plurality of sub-features includes:

convolving the features to obtain first intermediate features, wherein the number of channels of the first intermediate features is equal to the total number of the sub-features;

determining a probability for each of the plurality of sub-features based on the first intermediate feature;

and multiplying the probability of each sub-feature by the total number of channels of the feature to obtain the number of channels of each sub-feature.

5. The feature processing method of claim 4, wherein the determining the probability of each of the plurality of sub-features based on the first intermediate feature comprises:

downsampling the first intermediate feature to obtain a first feature vector of G1*1, where G is equal to the total number of sub-features;

And carrying out normalization processing on the characteristic values in the first characteristic vector to obtain the probability of each sub-characteristic in the plurality of sub-characteristics.

6. The feature processing method according to claim 1, wherein the dividing the feature into a plurality of sub-features by channels based on the inter-channel information distribution in the feature comprises:

presetting a channel number basic value of each of the plurality of sub-features;

processing the characteristics by using a deep learning method to determine the channel quantity offset value of each of the plurality of sub-characteristics;

adding the channel number basic value and the channel number offset value of each sub-feature to obtain the channel number of each sub-feature;

dividing the feature into a plurality of sub-features according to the number of channels of each of the plurality of sub-features;

wherein the sum of the number of channels of all sub-features is equal to the total number of channels of said feature.

7. The feature processing method according to claim 6, wherein the processing the feature using the deep learning method to determine the channel number offset value of each of the plurality of sub-features includes:

convolving the features to obtain second intermediate features, wherein the number of channels of the second intermediate features is equal to the total number of the sub-features;

Determining an offset probability for each of the plurality of sub-features based on the second intermediate feature;

multiplying the probability and the offset of each sub-feature to obtain the channel quantity offset value of each sub-feature;

wherein the product of the sum of the offset probabilities of all the sub-features and the offset, plus the sum of the channel number base values of all the sub-features, is equal to the total number of channels of the features.

8. The feature processing method according to claim 1, characterized in that the method further comprises:

reordering all sub-features;

the step of splicing the convolution processing results of all the sub-features to obtain the target features of the features comprises the following steps:

and splicing the convolution processing results of all the sub-features according to the new sequence.

9. The feature processing method of claim 8, wherein said reordering all sub-features comprises:

matrixing indexes of all the sub-features to obtain an index matrix;

performing transposition processing on the index matrix to obtain a transposed matrix;

and expanding the transposed matrix into one dimension to obtain new ordering of all the sub-features.

10. The feature processing method of claim 8, wherein said reordering all sub-features comprises:

Combining all the sub-features or a first number of channels in the convolution processing result of all the sub-features to obtain a first combined feature;

processing the first combined feature to obtain a one-dimensional third feature vector, wherein feature values in the third feature vector are in one-to-one correspondence with the sub-features;

and reordering the characteristic values in the third characteristic vector according to the size, wherein the new ordering of the characteristic values is the new ordering of the sub-characteristic corresponding to the characteristic values.

11. The method for processing features according to claim 1, wherein the splicing the convolution processing results of all the sub-features to obtain the target feature of the feature comprises:

performing group channel shuffling on the convolution processing results of the plurality of sub-features;

and splicing convolution processing results subjected to group channel shuffling to obtain target features of the features.

12. The feature processing method according to claim 11, wherein the performing group channel shuffling on the convolution processing results of the plurality of sub-features includes:

presetting exchange times;

based on the convolution processing results of the plurality of sub-features, carrying out random exchange for the exchange times;

And randomly extracting two sub-features in each random exchange process, and randomly extracting and exchanging channels to be exchanged in the two sub-features.

13. The feature processing method according to claim 11, wherein the performing group channel shuffling on the convolution processing results of the plurality of sub-features includes:

selecting at least one pair of sub-features from all the sub-features, wherein the correlation of the at least one pair of sub-features is higher than the correlation of a plurality of sub-features which are not selected, and each pair of sub-features comprises two sub-features;

and performing group channel shuffling on the convolution processing result of each pair of sub-features in the at least one pair of sub-features.

14. The feature processing method according to claim 13, wherein the selecting at least one pair of sub-features from all the sub-features includes:

combining all the sub-features or a fourth number of channels in the convolution processing result of all the sub-features to obtain a second combined feature;

processing the second combined feature to obtain a one-dimensional fourth feature vector, wherein feature values in the fourth feature vector are in one-to-one correspondence with the sub-features, and the proximity degree of the feature values corresponding to each pair of sub-features represents the correlation of each pair of sub-features;

The at least one pair of sub-features is selected from all sub-features based on the fourth feature vector.

15. The feature processing method of claim 13, wherein said performing group channel shuffling on the convolution processing results of each of the at least one pair of sub-features comprises:

determining a channel to be exchanged of each sub-feature by a deep learning method;

and exchanging the channels to be exchanged of the two sub-features of each pair of sub-features.

16. The feature processing method according to claim 15, wherein the determining the channel to be exchanged for each sub-feature by the deep learning method includes:

carrying out convolution processing on each sub-feature or the convolution processing result of each sub-feature to obtain a fifth feature vector of each sub-feature;

normalizing the fifth feature vector of each sub-feature;

multiplying the feature value in the fifth feature vector of each sub-feature subjected to normalization processing by the total number of channels of each sub-feature to determine the index of the channel to be exchanged of each sub-feature.

17. The method for processing features according to any one of claims 1 to 16, wherein the splicing the convolution processing results of all the sub-features to obtain the target feature includes:

Performing image processing based on the target features;

the image processing includes at least one of image encoding, image decoding, object detection, image segmentation, and image classification.

18. An image encoding method, characterized in that the encoding method comprises:

obtaining target features of an image based on the feature processing method of any one of claims 1 to 17;

the image is encoded based on the target feature.

19. The image encoding method according to claim 18, wherein,

the method for processing the features according to any one of claims 1 to 17, obtaining the target feature of the image, comprises: grouping second features of the image to obtain at least two sub-features of the second features; reordering all sub-features of the second feature; splicing all sub-features or processing results of all sub-features of the second feature according to the new sequence to obtain a group shuffling result of the second feature, wherein the feature is the group shuffling result of the second feature or the feature is the feature obtained after the group shuffling result of the second feature is processed; and/or the number of the groups of groups,

the encoding the image based on the target feature includes: grouping third features of the image to obtain at least two sub-features of the third features; reordering all sub-features of the third feature; splicing all sub-features of the third feature or the processing results of all the sub-features according to the new sequence to obtain a group shuffling result of the third feature; encoding the image based on the group shuffling result of the third feature; the third feature is the target feature, or the third feature is a feature obtained by processing the target feature.

20. The image encoding method according to claim 18, wherein,

the method for processing the features according to any one of claims 1 to 17, obtaining the target feature of the image, comprises: grouping fourth features of the image to obtain at least two sub-features of the fourth features; processing all sub-features of the fourth feature respectively; performing group channel shuffling on convolution processing results of at least two sub-features of the fourth feature; splicing the convolution processing results of the fourth characteristic after the group channel shuffling to obtain a channel shuffling result of the fourth characteristic; the characteristic is a channel shuffling result of the fourth characteristic, or the characteristic is a characteristic obtained after the channel shuffling result of the fourth characteristic is processed; and/or the number of the groups of groups,

the encoding the image based on the target feature includes: grouping fifth features of the image to obtain at least two sub-features of the fifth features; processing all sub-features of the fifth feature respectively; performing group channel shuffling on convolution processing results of at least two sub-features of the fifth feature; splicing the convolution processing results of the fifth feature after the group channel shuffling to obtain a channel shuffling result of the fifth feature; encoding the image based on the channel shuffling result of the fifth feature; the fifth feature is the target feature, or the fifth feature is a feature obtained by processing the target feature.

21. An encoder, the encoder comprising a processor; the processor is configured to execute instructions to implement the steps of the method according to any one of claims 1-20.

22. A computer readable storage medium having stored thereon instructions/program data, which when executed, implement the steps of the method of any of claims 1-20.