WO2022027917A1 - 图像处理方法、装置、系统、电子设备及可读存储介质 - Google Patents

图像处理方法、装置、系统、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2022027917A1
WO2022027917A1 PCT/CN2020/140923 CN2020140923W WO2022027917A1 WO 2022027917 A1 WO2022027917 A1 WO 2022027917A1 CN 2020140923 W CN2020140923 W CN 2020140923W WO 2022027917 A1 WO2022027917 A1 WO 2022027917A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
feature map
point
output
output feature
Prior art date
Application number
PCT/CN2020/140923
Other languages
English (en)
French (fr)
Inventor
孙彬
赵明国
熊友军
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Priority to US17/388,043 priority Critical patent/US20220044370A1/en
Publication of WO2022027917A1 publication Critical patent/WO2022027917A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image processing method, apparatus, system, electronic device, and readable storage medium.
  • the purpose of the present application is to provide an image processing method, apparatus, system, electronic device, and readable storage medium, which can improve the feature expression capability of the output feature map.
  • an embodiment of the present application provides an image processing method, the method comprising:
  • the multi-channel feature map is processed through parallel point-by-point convolution operations and non-point-by-point operations to obtain a multi-channel first output feature map and a multi-channel second output feature map, wherein the non-point-by-point convolution is used to describe Information exchange between spatial features and feature maps of each channel;
  • the multi-channel first output feature map and the multi-channel second output feature map are fused to obtain a multi-channel third output feature map.
  • the multi-channel feature map is processed through a parallel point-by-point convolution operation and a non-point-by-point operation to obtain a multi-channel first output feature map and a multi-channel second output feature map, include:
  • the multi-channel first output feature map and the multi-channel second output feature map are fused to obtain a multi-channel output feature map, including:
  • the multi-channel first output feature map and the multi-channel second output feature map are accumulated to obtain the multi-channel output feature map.
  • the multi-channel feature map is divided into a first part of the multi-channel feature map and a second part of the multi-channel feature map in the channel dimension.
  • the parallel point-by-point convolution operation and the non-point-by-point operation pair The multi-channel feature map is processed to obtain a multi-channel first output feature map and a multi-channel second output feature map, including:
  • the multi-channel first output feature map and the multi-channel second output feature map are fused to obtain a multi-channel output feature map, including:
  • a channel reorganization operation is performed on the multi-channel first output feature map and the multi-channel second output feature map to obtain the multi-channel third output feature map.
  • the non-point-by-point convolution operation is an adaptive linear operation, and the non-point-by-point convolution operation is performed to obtain the multi-channel second output feature map, including:
  • the multiple channel feature maps are grouped in the channel dimension, and each channel feature map is grouped.
  • a linear operation is performed on the feature map group to obtain the multi-channel second output feature map.
  • the first ratio is a positive integer
  • the first ratio is the ratio of the number of channels of the multiple channel feature maps to the number of channels of the multi-channel second output feature map
  • the multiple channel feature maps are grouped in the channel dimension according to the number of channels of the multiple channel feature maps to be performed non-pointwise convolution operations and the number of channels of the multi-channel second output feature map, and the Perform a linear operation on each feature map group to obtain the multi-channel second output feature map, including:
  • each feature map group For each feature map group, perform a linear operation on the feature maps corresponding to each channel in the feature map group, and use the cumulative sum of the linear operation results of the feature map group as a second output feature map corresponding to the feature map group .
  • the second ratio is a positive integer
  • the second ratio is the ratio of the number of channels of the multi-channel second output feature map to the number of channels of the feature maps of the multiple channels
  • the multiple channel feature maps are grouped in the channel dimension, and each channel feature map is grouped.
  • the feature map group performs a linear operation to obtain the multi-channel second output feature map, including:
  • the multiple channel feature maps are evenly divided into multiple feature map groups, wherein the number of feature maps in each feature map group is 1;
  • For each feature map group perform multiple linear operations on the feature maps in the feature map group, and use the multiple linear operation results corresponding to the feature map as the second output feature maps of the multiple channels corresponding to the feature map, wherein , and the number of linear operations performed on the feature maps in one feature map group is the second ratio.
  • the first ratio is the number of channels of the plurality of channel feature maps and the multi-channel second output feature map
  • the second ratio is the ratio of the number of channels of the multi-channel second output feature map to the number of channels of the multiple channel feature maps.
  • the number of channels of the multiple channel feature maps and the number of channels of the multi-channel second output feature map, the multiple channel feature maps are grouped in the channel dimension, and a linear operation is performed on each feature map group to obtain the Multi-channel second output feature map, including:
  • the plurality of channel feature maps and the multi-channel second output feature maps are grouped in the channel dimension according to the target common divisor, wherein the number of feature maps in each feature map group is multiple, and each second The number of second output feature maps in the output feature map group is multiple, and the number of groups is the common divisor of the target;
  • each feature map group multiple linear operations are respectively performed on the feature map group to obtain a second output feature map group corresponding to the feature map group.
  • an embodiment of the present application provides an image processing apparatus, and the image processing apparatus includes:
  • a processing subunit configured to process the multi-channel feature map through parallel point-by-point convolution operations and non-point-by-point operations to obtain a multi-channel first output feature map and a multi-channel second output feature map, wherein the non-point-by-point output feature map Point convolution is used to describe the spatial features of channels and the information exchange between feature maps;
  • the fusion subunit is configured to perform fusion processing on the multi-channel first output feature map and the multi-channel second output feature map to obtain a multi-channel output feature map.
  • an embodiment of the present application provides an image processing system, where the image processing system includes a network model, and the network model includes the image processing apparatus described in the foregoing embodiments.
  • the network model is an attitude estimation model
  • the attitude estimation model includes a preprocessing network, a main network and a regressor connected in sequence
  • the preprocessing network is used to preprocess the input image to obtain the input feature map
  • the main body network is used to process the input feature map to obtain an output feature map
  • the main body network includes a basic unit, an exchange unit and a residual unit, and the parts of the exchange unit with the same resolution of the input and output feature maps correspond to At least one convolution unit in the convolution unit, the convolution unit in the basic unit, and the convolution unit between two convolution units of the same size in the residual unit is the image processing device;
  • the regressor is used to convert the output feature map into a position heat map, and determine the positions of the joint points according to the position heat map.
  • an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores machine-executable instructions that can be executed by the processor, and the processor can execute the machine-executable instructions In order to realize the image processing method described in any one of the foregoing embodiments.
  • an embodiment of the present application provides a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the image processing method described in any one of the foregoing embodiments.
  • the image processing method, device, system, electronic device, and readable storage medium provided by the embodiments of the present application, after obtaining the multi-channel feature map to be processed, parallel point-by-point convolution operations and non-point-by-point operations are performed on the multi-channel feature maps.
  • the channel feature map is processed to obtain the multi-channel first output feature map and the multi-channel second output feature map; then the obtained multi-channel first output feature map and the multi-channel second output feature map are fused to obtain the multi-channel first output feature map.
  • the obtained multi-channel third output feature map not only involves the spatial features of each channel in the multi-channel feature map and the information exchange between the feature maps, but also involves the information exchange between each point of the feature map in the multi-channel feature map. features and information exchange between channels, so that the multi-channel third output feature map contains more feature information, which improves the feature expression capability of the output feature map; The time of the third output feature map of the channel.
  • FIG. 1 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 3 is one of the schematic diagrams of processing a multi-channel feature map provided by an embodiment of the present application.
  • FIG. 4 is the second schematic diagram of processing a multi-channel feature map provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of performing adaptive linear operation on multiple channel feature maps to be subjected to non-point-by-point convolution operations provided by an embodiment of the present application;
  • FIG. 6 is a schematic block diagram of an image processing apparatus provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of the structure of HRNet
  • FIG. 8 is an example schematic diagram of a multi-resolution block in FIG. 7;
  • FIG. 9 is a schematic diagram of an exchange unit, a basic unit, and a residual unit in an attitude estimation model provided by an embodiment of the present application.
  • Icons 100-electronic device; 110-memory; 120-processor; 130-communication unit; 200-image processing device; 210-acquisition sub-unit; 220-processing sub-unit; 230-fusion sub-unit.
  • FIG. 1 is a schematic block diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 described in the embodiments of the present application may be, but not limited to, a desktop computer, a tablet computer, a server, and the like.
  • the electronic device 100 may include a memory 110 , a processor 120 and a communication unit 130 .
  • the elements of the memory 110 , the processor 120 and the communication unit 130 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these elements may be electrically connected to each other through one or more communication buses or signal lines.
  • the memory 110 is used for storing programs or data.
  • the memory 110 may be, but not limited to, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), or Erasable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), Electrical Erasable Programmable Read-Only Memory (EEPROM), etc.
  • RAM Random Access Memory
  • ROM read only memory
  • PROM programmable read only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrical Erasable Programmable Read-Only Memory
  • the processor 120 is used to read/write data or programs stored in the memory 110, and perform corresponding functions.
  • an image processing apparatus 200 is stored in the memory 110 , and the image processing apparatus 200 includes at least one software function module that can be stored in the memory 110 in the form of software or firmware.
  • the processor 120 executes various functional applications and data processing by running software programs and modules stored in the memory 110, such as the image processing apparatus 200 in the embodiment of the present application, that is, the image processing in the embodiment of the present application is implemented. method.
  • the communication unit 130 is used to establish a communication connection between the electronic device 100 and other communication terminals through a network, and to send and receive data through the network.
  • FIG. 1 is only a schematic structural diagram of the electronic device 100 , and the electronic device 100 may further include more or less components than those shown in FIG. 1 , or have different components from those shown in FIG. 1 . Configuration. Each component shown in FIG. 1 may be implemented in hardware, software, or a combination thereof.
  • FIG. 2 is a schematic flowchart of an image processing method provided by an embodiment of the present application. The method can be applied to the electronic device 100 described above. The specific flow of the image processing method is described in detail below.
  • step S110 a multi-channel feature map to be processed is obtained.
  • Step S120 processing the multi-channel feature map through parallel point-by-point convolution operations and non-point-by-point operations to obtain a multi-channel first output feature map and a multi-channel second output feature map.
  • non-point-wise convolution is used to describe the spatial features of each channel and the information exchange between feature maps.
  • Step S130 performing fusion processing on the multi-channel first output feature map and the multi-channel second output feature map to obtain a multi-channel third output feature map.
  • the multi-channel feature map to be processed is obtained, parallel point-by-point convolution and non-point-by-point operations are used to process the multi-channel feature map, thereby obtaining the multi-channel first output feature map and the multi-channel feature map.
  • Channel second output feature map is obtained.
  • the multi-channel first output feature map is obtained through a point-by-point convolution operation
  • the multi-channel second output feature map is obtained through a non-point-by-point convolution operation.
  • the obtained multi-channel first output feature map and the multi-channel second output feature map are fused to obtain a multi-channel third output feature map.
  • point-by-point convolution is used to describe the features of each point and information exchange between input channels
  • non-point-by-point convolution is used to describe the spatial features and Information exchange between several input feature maps. That is, the obtained multi-channel third output feature map not only involves the spatial features of each channel in the multi-channel feature map and the information exchange between the feature maps, but also involves the features of each point of the feature map in the multi-channel feature map. and information exchange between channels.
  • the above two operations are processed in parallel.
  • the obtained multi-channel third output feature map contains more feature information, thereby improving the feature expression capability of the output feature map, avoiding the occurrence of valid information that is not extracted as a feature due to only considering a certain aspect of information. ; at the same time, it will not prolong the time to obtain the multi-channel third output feature map.
  • this processing method ignores the features of each point in the feature map of each channel in the multi-channel feature map and the information exchange between the input channels.
  • the ignored information which is considered redundant information, can give a better understanding of the multi-channel feature maps to be processed. Therefore, the feature expression capability of the multi-channel feature map to be processed by the multi-channel third output feature map can be improved by this solution.
  • the multi-channel feature map may be obtained by receiving a multi-channel feature map sent by other devices or processing an input image.
  • FIG. 3 is one of schematic diagrams of processing a multi-channel feature map provided by an embodiment of the present application.
  • PWC in Figure 3 represents point-by-point convolution
  • non-PWC represents non-point-by-point convolution.
  • the multi-channel first output feature map, the multi-channel second output feature map, and the multi-channel third output feature map may be obtained in the following manner.
  • a point-by-point convolution operation is performed on the multi-channel feature map to obtain the multi-channel first output feature map
  • a non-point-by-point convolution operation is performed on the multi-channel feature map
  • the multi-channel second output feature map is obtained. That is, the objects processed by point-by-point convolution and non-point-by-point convolution are exactly the same.
  • an accumulation operation is performed on the multi-channel first output feature map and the multi-channel second output feature map, that is, "Add" processing is performed, so as to obtain the multi-channel third output feature map.
  • the above-mentioned accumulation operation refers to performing an addition operation on the same pixel points of the multiple output feature maps.
  • feature map A1 in the multi-channel first output feature map corresponds to feature maps 1 to 3 in the multi-channel feature map
  • feature map B1 in the multi-channel second output feature map corresponds to the multi-channel feature map
  • the feature map A1 and the feature map B1 are accumulated for the same pixel points to realize the fusion of the feature map A1 and the feature map B1
  • the feature map obtained by fusion is the third output feature of a channel picture.
  • FIG. 4 is a second schematic diagram of processing a multi-channel feature map provided by an embodiment of the present application.
  • Split in Figure 4 represents grouping, and Shuffle represents reorganization.
  • the multi-channel first output feature map, the multi-channel second output feature map, and the multi-channel third output feature map may be obtained in the following manner.
  • the multi-channel feature map is divided into a first part of the multi-channel feature map and a second part of the multi-channel feature map in the channel dimension.
  • the number of feature maps included in the first part of the multi-channel feature map and the second part of the multi-channel feature map may be the same or different, that is, the multi-channel feature map in the channel dimension may be divided by an average score. , can also be uneven. If it is an average score, it means that the multi-channel feature map is equally divided into a first part of the multi-channel feature map and a second part of the multi-channel feature map in the channel dimension.
  • the multi-channel feature map when the multi-channel feature map is obtained, the multi-channel feature map is a first part of the multi-channel feature map and a second part of the multi-channel feature map, which are divided into two parts.
  • the multi-channel feature map after obtaining the multi-channel feature map, before performing the point-by-point convolution operation and the non-point-by-point operation, the multi-channel feature map can be divided into two parts, and then the multi-channel feature map can be obtained.
  • the multi-channel feature map may be divided into the first part of the multi-channel feature map and the second part of the multi-channel feature map by using a preset channel separation operator.
  • the multi-channel first output feature map and the multi-channel second output feature map may be channelized in a "Concat” manner.
  • "Shuffle” processing is performed, that is, channel shuffling processing is performed, so as to obtain the multi-channel third output feature map.
  • the non-point-by-point convolution operation may be a depthwise convolution operation, a group convolution operation, an extended convolution operation, a deconvolution operation, and a convolution kernel with different sizes in conventional convolution operations.
  • a convolution kernel with different sizes in conventional convolution operations is any of the standard 1-by-1 convolution operations.
  • the non-point-by-point convolution operation may also be an adaptive linear operation.
  • adaptive linear operation adopted, the multi-channel second output feature map can be obtained in the following manner: according to the number of channels of the multi-channel feature maps to be performed non-pointwise convolution operations and the multi-channel second output feature The number of channels in the map, the multiple channel feature maps are grouped in the channel dimension, and a linear operation is performed on each feature map group to obtain the multi-channel second output feature map.
  • the multiple channel feature maps to be subjected to the non-point-wise convolution operation are determined by the objects targeted by the non-point-wise convolution operation.
  • the multiple channel feature maps are the multi-channel feature maps to be processed.
  • the multiple channel feature maps are the second part of the multi-channel feature maps.
  • the first ratio and the second ratio can be obtained first, and then it is judged whether the first ratio or the second ratio is a positive integer, and according to the ratio which is a positive integer, the channel dimension Feature maps are grouped.
  • the first ratio is the ratio of the number of channels of the plurality of channel feature maps to the number of channels of the multi-channel second output feature map
  • the second ratio is the ratio of the multi-channel second output feature map The ratio of the number of channels to the number of channels of the plurality of channel feature maps.
  • the plurality of channel feature maps may be equally divided into a plurality of feature map groups in the channel dimension according to the first ratio.
  • the number of feature maps in each feature map group is the first ratio. Then, for each feature map group, perform a linear operation on the feature maps corresponding to each channel in the feature map group, and use the cumulative sum of the linear operation results of the feature map group as a second output corresponding to the feature map group feature map.
  • 1 , 2 , .
  • the feature map of the ith channel in the multi-channel feature map to be processed Y represents the second output feature map, Y [i] represents the second output feature map of the ith channel, and ⁇ i represents the second output feature map of each channel
  • C i represents the number of channels of the multi-channel feature map to be processed
  • C o represents the number of channels of the multi-channel second output feature map.
  • the first ratio is a positive integer, that is, when the ratio of the number of channels of the plurality of channel feature maps to the number of channels of the multi-channel second output feature map is a positive integer, based on ⁇ i
  • the ⁇ channel feature maps in the channel feature map are respectively subjected to linear operation, and the sum of the linear operation results is used as the second output feature map Y [i] of the ith channel.
  • the second output feature map Y [i] is obtained by processing ⁇ channel feature maps in the plurality of channel feature maps.
  • the above-mentioned linear operation may be implemented by methods such as affine transformation, wavelet transformation, and the like.
  • convolution is an efficient operation, which is well supported by hardware and can cover many widely used linear operations, so depthwise separable convolution can also be used to implement the above linear operation, that is, the above adaptive linear operation
  • the operation may be an adaptive convolution operation (Adaptive Convolution, AC).
  • the multiple channel feature maps are X 1 ⁇ X 12
  • the multi-channel second output feature maps are Y 1 ⁇ Y 6
  • the ratio of the number of channels of the plurality of channel feature maps to the number of channels of the multi-channel second output feature map is 2.
  • Each feature map group includes 2 channel feature maps.
  • the second output feature map Y corresponding to the feature maps X 1 and X 2 1 can be calculated in the same way.
  • the second output feature maps Y 3 to Y 6 can be obtained by recalculation.
  • the multi-channel second output feature maps Y 1 to Y 6 can be obtained.
  • the plurality of channel feature maps may be equally divided into a plurality of feature map groups in the channel dimension.
  • the number of feature maps in each feature map group is 1.
  • For each feature map group perform multiple linear operations on the feature maps in the feature map group, and use the multiple linear operation results corresponding to the feature map as the second output feature maps of the multiple channels corresponding to the feature map.
  • the number of linear operations performed on the feature maps in one feature map group is the second ratio.
  • the second ratio is a positive integer, that is, when the ratio of the number of channels of the multi-channel second output feature map to the number of channels of the plurality of channel feature maps is a positive integer, based on ⁇ i
  • Each channel feature map in the channel feature map is subjected to ⁇ linear operations respectively, so as to generate ⁇ second output feature maps for each channel feature map.
  • the above-mentioned linear operation may be implemented by methods such as affine transformation, wavelet transformation, and the like.
  • the above linear operations can also be implemented using depthwise separable convolutions.
  • the size of the input feature map is assumed to be H i ⁇ W i ⁇ C i , where H i represents the height of the input feature map, Wi represents the width of the input feature map, and C i represents the number of input channels .
  • the size of the output channel feature map is H o ⁇ W o ⁇ C o , where H o represents the height of the output feature map, W o represents the width of the output feature map, and C o represents the number of output channels.
  • the size of each convolution kernel is K ⁇ K ⁇ C i . Therefore, the total computational overhead is:
  • Convolution operations such as depthwise convolution and group convolution can effectively reduce FLOPs.
  • group convolution is to group the input feature maps, and then convolve each group separately.
  • this convolution requires setting a grouping parameter G. If different convolution layers use the same parameter G, for feature maps with more channels, there will be more channels in each group, which may lead to less efficient convolution per group.
  • the group convolution becomes a depthwise convolution.
  • the FLOPs are further reduced, since the depthwise convolution cannot change the number of channels, it will Followed by a point convolution, which increases latency.
  • the above-mentioned method of obtaining the multi-channel second output feature map by using the first ratio and the second ratio does not require manual setting of the parameters of each layer like ordinary grouped convolution, nor does it need to manually set the parameters of each layer.
  • the same convolution constrains the number of input channels and the number of output channels to be the same.
  • multiple feature maps can be adaptively generated from one feature map, and multiple feature maps can also be adaptively combined into one feature map.
  • C i input feature maps can adaptively generate C o output feature maps (as shown in Figure 5).
  • the computational cost of adaptive convolution is:
  • the first ratio is the number of channels in the feature maps of the plurality of channels and the number of channels in the multi-channel second ratio.
  • the ratio of the number of channels of the output feature map, and the second ratio is the ratio of the number of channels of the multi-channel second output feature map to the number of channels of the feature maps of the multiple channels, both of which are not positive integers, and can also be determined according to
  • the multi-channel second output feature map is obtained by grouping the plurality of channel feature maps and then calculating in the following manner.
  • a common divisor of the number of channels of the plurality of channel feature maps and the number of channels of the multi-channel second output feature map is obtained. Then, take one of the common divisors as the target common divisor.
  • any common divisor may be used as the target common divisor, or the greatest common divisor may be selected as the target common divisor.
  • the target common divisor may be selected in other ways. In one form of this embodiment, the target common divisor is the greatest common divisor.
  • the plurality of channel feature maps and the multi-channel second output feature maps are grouped in a channel dimension according to the target common divisor.
  • the number of feature maps in each feature map group is multiple, and the number of second output feature maps in each second output feature map group is multiple.
  • the number of groups is the common divisor of the target. For each feature map group, multiple linear operations are respectively performed on the feature map group to obtain a second output feature map group corresponding to the feature map group.
  • the multiple channel feature maps and the multi-channel second output feature maps are divided into b groups.
  • the method of obtaining the second output feature map group corresponding to the feature map group according to a feature map group is similar to the calculation method after grouping in the group convolution. For example, using the convolution operation to realize the above-mentioned linear operation, one feature map group includes 4 feature maps, and a second output feature map group includes 3 second output feature maps, and the convolution kernel group corresponding to the feature map group contains 4 feature maps.
  • the adaptive linear operation can describe the information exchange within the group well, and can effectively reduce the amount of parameters and FLOPs. Since the redundant information in the feature map can provide a better understanding of the input data, the embodiment of the present application also retains some redundant information in the feature map through point-by-point convolution, so as to avoid the problem of using only adaptive linear operations. The lack of information communication between groups that occurs. Therefore, the embodiment of the present application uses parallel point-by-point convolution operations and non-point-by-point convolution operations to process the multi-channel feature map to be processed, thereby improving the feature expression capability of the obtained multi-channel third output feature map, and at the same time Only a small number of FLOPs and parameters are used.
  • formula (4) corresponds to When the number of channels in the first part of the multi-channel feature map and the number of channels in the second part of the multi-channel feature map are the same, and K ⁇ min(C i , C o ) in the embodiment of the present application, formula (4) corresponds to When the adaptive convolution and channel reorganization of the multi-channel third output feature map are obtained, the calculation ratio of the standard convolution and the embodiment of the present application is:
  • the calculation amount of the multi-channel third output feature map obtained without channel reorganization is the same as that in the channel, etc.
  • the calculation ratio of dividing and obtaining the multi-channel third output feature map through channel reorganization is:
  • the multi-channel third output feature map is obtained by the corresponding method of channel reorganization, which can speed up the speed of obtaining the multi-channel third output feature map. Comparing Figure 3 and Figure 4, it can be seen that replacing the "Add" operation with "Split”, “Concat” and “Shuffle” can further reduce FLOPs and maintain the information exchange between channels.
  • the image processing apparatus 200 may use the components of the electronic device 100 shown in FIG. 1 above. structure.
  • FIG. 6 is a schematic block diagram of an image processing apparatus 200 provided by an embodiment of the present application. It should be noted that the basic principles and technical effects of the image processing apparatus 200 provided in this embodiment are the same as those of the above-mentioned embodiments. For brief description, for the parts not mentioned in this embodiment, reference may be made to the above-mentioned embodiments. corresponding content.
  • the image processing apparatus 200 can be applied to the electronic device 100 .
  • the image processing apparatus 200 can easily replace the standard convolution unit without changing the network architecture. That is, the image processing apparatus 200 can be replaced by standard convolution or other convolution in the existing neural network structure.
  • the image processing apparatus 200 has a plug-and-play feature.
  • the image processing apparatus 200 may include an obtaining subunit 210 , a processing subunit 220 and a fusion subunit 230 .
  • the obtaining subunit 210 is used to obtain the multi-channel feature map to be processed.
  • the processing subunit 220 is configured to process the multi-channel feature map through parallel point-by-point convolution operations and non-point-by-point operations to obtain a multi-channel first output feature map and a multi-channel second output feature map.
  • non-pointwise convolution is used to describe the spatial features of channels and the information exchange between feature maps.
  • the fusion subunit 230 is configured to perform fusion processing on the multi-channel first output feature map and the multi-channel second output feature map to obtain a multi-channel output feature map.
  • the above-mentioned sub-units may be stored in the memory 110 shown in FIG. 1 in the form of software or firmware (Firmware) or solidified in the operating system (Operating System, OS) of the electronic device 100, and can be stored in the operating system (Operating System, OS) of the electronic device 100.
  • Processor 120 executes. Meanwhile, data required to execute the above-mentioned subunits, codes of programs, and the like may be stored in the memory 110 .
  • Quantization-based methods can reduce redundancy in computation. However, during inference, the shared weights need to be restored to their original positions, so running memory cannot be saved. Pruning-based methods can reduce redundant connections in pretrained models, however, finding the best performance requires a lot of work.
  • AlexNet first proposed group convolution, which distributed the model on two GPUs (Graphics Processing Unit).
  • Depthwise convolutions require the same input and output channels, so 1 ⁇ 1 convolutions are usually added before or after depthwise convolutional layers. In this case, it is inevitable to decompose a convolutional layer into two or three concatenated layers.
  • MobileNet splits each layer into two layers, first using depthwise convolutions and then using pointwise convolutions. Although this method reduces parameters and FLOPs, it will cause delay.
  • an embodiment of the present application further provides an image processing system, wherein the image processing system includes a network model, and the network model includes the image processing apparatus 200 .
  • the image processing apparatus 200 can adaptively obtain an output feature map of a desired size, and improve the feature expression capability of the output feature map without increasing the delay, and also has the characteristics of reducing FLOPs and the amount of parameters. Thereby, the processing speed of the image processing system can be improved, and the quality of the processing result can be guaranteed.
  • the network model may be, but not limited to, an attitude estimation model and other models.
  • the network model is an attitude estimation model
  • the attitude estimation model includes a preprocessing network, a main network, and a regressor connected in sequence.
  • the preprocessing network is used to preprocess the input image to obtain the input feature map.
  • the main body network is used to process the input feature map to obtain an output feature map, wherein the main body network includes a basic unit, an exchange unit and a residual unit, and the parts of the exchange unit with the same resolution of the input and output feature maps correspond to At least one convolution unit in the convolution unit, the convolution unit in the basic unit, and the convolution unit between two convolution units of the same size in the residual unit is the image processing apparatus 200 .
  • the regressor is used to convert the output feature map into a position heat map, and determine the positions of the joint points according to the position heat map.
  • the preprocessing network may be a stem composed of two convolutional units for reducing the resolution, and the preprocessing network may reduce the resolution to 1/4 of the input image.
  • the input feature map and output feature map of the subject network have the same resolution.
  • the subject network may be HRNet.
  • HRNet starts with the high-resolution branch in the first stage, maintains the high-resolution representation throughout, and at each subsequent stage, adds a parallel to the current branch at 1/2 the lowest resolution in the current branch new branch.
  • HRNet contains a total of four stages (that is, there are four regions with background colors and connections in the network in Figure 7).
  • the first stage consists of 4 residual units and 1 exchange unit, each residual unit consists of a bottleneck with a width of 64 followed by a 3 ⁇ 3 convolution unit; the exchange unit will generate two branches, one branch will The width of the feature map is reduced to 32, and another branch reduces the resolution to 1/2.
  • each stage consists of 4 parallel basic units and one exchange unit.
  • the second, third, and fourth stages contain 1, 4, and 3 multi-resolution blocks, respectively.
  • the multi-resolution blocks of the second, third, and fourth stages are regarded as the second, third, and fourth resolution blocks, respectively.
  • each node in Figure 7 has only one multi-resolution block.
  • a four-resolution block contains 4 branches, each branch contains 4 basic units (as shown in a in Figure 8), and each basic unit contains two 3 ⁇ 3 convolutions unit.
  • a cross-resolution exchange unit (as shown in b in Fig. 8) completes the information exchange of the four branches.
  • the residual unit in the first stage, the basic units in the second to fourth stages, and the relevant convolution units in the exchange unit in the HRNet main network may all be the image processing apparatus 200 .
  • the convolution unit of the same part of the input and output feature map resolution (that is, the dark gray part in a in FIG. 9 ) is set as the image processing apparatus 200 , so that a lightweight exchange unit is included in the pose estimation model.
  • the basic unit two convolution units in the basic unit (ie, the dark gray part in b in FIG. 9 ) are set as the image processing apparatus 200 , so that the pose estimation model includes light-weight basic units.
  • the convolution unit between two convolution units with the same size in the residual unit (ie, the dark gray part in c in FIG. 9 ) is set as the image processing apparatus 200, thereby making the pose estimation possible.
  • a lightweight residual unit is included in the model.
  • the pose network model in the embodiments of the present application is a lightweight model.
  • the pose network model can improve the efficiency of pose estimation and obtain comparable accuracy.
  • Embodiments of the present application further provide a readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the image processing method is implemented.
  • the embodiments of the present application provide an image processing method, device, system, electronic device, and readable storage medium.
  • the The multi-channel feature map is processed point-by-point to obtain the multi-channel first output feature map and the multi-channel second output feature map; then the obtained multi-channel first output feature map and the multi-channel second output feature map are processed.
  • a multi-channel third output feature map is obtained.
  • non-point-wise convolution is used to describe the spatial features of each channel and the information exchange between feature maps.
  • the obtained multi-channel third output feature map not only involves the spatial features of each channel in the multi-channel feature map and the information exchange between the feature maps, but also involves the information exchange between each point of the feature map in the multi-channel feature map. features and information exchange between channels, so that the multi-channel third output feature map contains more feature information, which improves the feature expression capability of the output feature map; The time of the third output feature map of the channel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法、装置、系统、电子设备及可读存储介质,涉及计算机技术领域。所述方法包括:获得待处理的多通道特征图(S110);通过并行的逐点卷积运算和非逐点运算对多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图(S120);对多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道第三输出特征图(S130)。通过上述方法获得的多通道第三输出特征图,既涉及了多通道特征图中各通道的空间特征和特征图之间的信息交换,也涉及了多通道特征图中特征图每个点的特征以及通道之间的信息交换,因而提高了作为输出的特征图的特征表达能力。

Description

图像处理方法、装置、系统、电子设备及可读存储介质
本申请要求于2020年08月05日在中国专利局提交的、申请号为202010775325.2的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,具体而言,涉及一种图像处理方法、装置、系统、电子设备及可读存储介质。
背景技术
随着深度学习的发展,越来越多领域都会采用通过神经网络基于图像获得特征图的方式来获取图像特征,然后利用图像特征进行动作识别、人脸识别等。由此可知,若获得的作为输出的特征图对图像的特征表达能力不佳,则会导致后续的识别结果不佳。因此,如何提高作为输出的特征图的特征表达能力已成为本领域技术人员亟需解决的技术问题。
技术问题
有鉴于此,本申请的目的在于提供一种图像处理方法、装置、系统、电子设备及可读存储介质,其能够提高作为输出的特征图的特征表达能力。
技术解决方案
为了实现上述目的,本申请实施例采用的技术方案如下:
第一方面,本申请实施例提供一种图像处理方法,所述方法包括:
获得待处理的多通道特征图;
通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图,其中,非逐点卷积用于描述各通道的空间特征和特征图之间的信息交换;
对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道第三输出特征图。
在可选的实施方式中,所述通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图,包括:
对所述多通道特征图进行逐点卷积运算,得到所述多通道第一输出特征图;
对所述多通道特征图进行非逐点卷积运算,得到所述多通道第二输出特征图;
所述对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道输出特征图,包括:
对所述多通道第一输出特征图及多通道第二输出特征图进行累加运算,得到所述多通道输出特征图。
在可选的实施方式中,所述多通道特征图在通道维度分为第一部分多通道特征图及第二部分多通道特征图,所述通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图,包括:
对所述第一部分多通道特征图进行逐点卷积运算,得到所述多通道第一输出特征图;
对所述第二部分多通道特征图进行非逐点卷积运算,得到所述多通道第二输出特征图;
所述对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道输出特征图,包括:
对所述多通道第一输出特征图及多通道第二输出特征图进行通道重组运算,得到所述多通道第三输出特征图。
在可选的实施方式中,非逐点卷积运算为自适应线性运算,所述进行非逐点卷积运算,得到所述多通道第二输出特征图,包括:
根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图。
在可选的实施方式中,若第一比值为正整数,其中,所述第一比值为所述多个通道特征图的通道数与所述多通道第二输出特征图的通道数之比,所述根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图,包括:
按照所述第一比值在通道维度将所述多个通道特征图平均分为多个特征图组,其中,每个特征图组中特征图的数量为所述第一比值;
针对每个特征图组,对该特征图组中各通道对应的特征图分别进行线性运算,并将该特征图组的线性运算结果的累加和作为该特征图组对应的一第二输出特征图。
在可选的实施方式中,若第二比值为正整数,所述第二比值为所述多通道第二输出特征图的通道数与所述多个通道特征图的通道数之比,所述根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图,包括:
在通道维度将所述多个通道特征图平均分为多个特征图组,其中,每个特征图组中特征图的数量为1;
针对各特征图组,对该特征图组中的特征图进行多次线性运算,并将该特征图对应的多个线性运算结果作为该特征图对应的多个通道的第二输出特征图,其中,对一个特征图组中的特征图进行线性运算的次数为所述第二比值。
在可选的实施方式中,若第一比值及第二比值均不为正整数,其中,所述第一比值为所述多个通道特征图的通道数与所述多通道第二输出特征图的通道数之比,所述第二比值为所述多通道第二输出特征图的通道数与所述多个通道特征图的通道数之比,所述根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图,包括:
获得所述多个通道特征图的通道数和所述多通道第二输出特征图的通道数的公约数;
将其中一个公约数作为目标公约数;
根据所述目标公约数在通道维度对所述多个通道特征图及所述多通道第二输出特征图进行分组,其中,每个特征图组中特征图的数量为多个,每个第二输出特征图组中第二输出特征图的数量为多个,组的数量为所述目标公约数;
针对每个特征图组,对该特征图组分别进行多次线性运算,得到该特征图组对应的第二输出特征图组。
第二方面,本申请实施例提供一种图像处理装置,所述图像处理装置包括:
获得子单元,用于获得待处理的多通道特征图;
处理子单元,用于通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图,其中,非逐点卷积用于描述通道的空间特征和特征图之间的信息交换;
融合子单元,用于对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道输出特征图。
第三方面,本申请实施例提供一种图像处理系统,所述图像处理系统包括网络模型,所述网络模型中包括前述实施方式所述的图像处理装置。
在可选的实施方式中,所述网络模型为姿态估计模型,所述姿态估计模型包括依次连接的预处理网络、主体网络以及回归器,
所述预处理网络,用于对输入图像进行预处理,得到输入特征图;
所述主体网络,用于对输入特征图进行处理,得到输出特征图,其中,所述主体网络包括基本单元、交换单元以及残差单元,所述交换单元中输入输出特征图分辨率相同部分对应的卷积单元、基本单元中的卷积单元、残差单元中两个大小相同的卷积单元之间的卷积单元中的至少一个卷积单元为所述图像处理装置;
所述回归器,用于将所述输出特征图转换为位置热图,并根据所述位置热图确定关节点的位置。
第四方面,本申请实施例提供一种电子设备,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器可执行所述机器可执行指令以实现前述实施方式中任意一项所述的图像处理方法。
第五方面,本申请实施例提供一种可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如前述实施方式中任意一项所述的图像处理方法。
有益效果
本申请实施例提供的图像处理方法、装置、系统、电子设备及可读存储介质,在获得待处理的多通道特征图后,通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图;然后对得到的多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道第三输出特征图。其中,非逐点卷积用于描述各通道的空间特征和特征图之间的信息交换。由此,使得获得的多通道第三输出特征图,既涉及了多通道特征图中各通道的空间特征和特征图之间的信息交换,也涉及了多通道特征图中特征图每个点的特征以及通道之间的信息交换,因而使得该多通道第三输出特征图包含了更多的特征信息,提高了作为输出的特征图的特征表达能力;同时通过采用并行计算,可避免延长得到多通道第三输出特征图的时间。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1是本申请实施例提供的电子设备的方框示意图;
图2是本申请实施例提供的图像处理方法的流程示意图;
图3是本申请实施例提供的对多通道特征图进行处理的示意图之一;
图4是本申请实施例提供的对多通道特征图进行处理的示意图之二;
图5是本申请实施例提供的对待进行非逐点卷积运算的多个通道特征图进行自适应线性运算的示意图;
图6是本申请实施例提供的图像处理装置的方框示意图;
图7是HRNet的结构示意图;
图8是图7中多分辨率块的举例示意图;
图9是本申请实施例提供的姿态估计模型中交换单元、基本单元及残差单元的示意图。
图标:100-电子设备;110-存储器;120-处理器;130-通信单元;200-图像处理装置;210-获得子单元;220-处理子单元;230-融合子单元。
本发明的实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请实施例的组件可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请的选定实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,术语“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
需要说明的是,在不冲突的情况下,本申请的实施例中的特征可以相互结合。
请参照图1,图1是本申请实施例提供的电子设备100的方框示意图。本申请实施例中所述电子设备100可以是,但不限于,台式电脑、平板电脑、服务器等。所述电子设备100可以包括存储器110、处理器120及通信单元130。所述存储器110、处理器120以及通信单元130各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。
其中,存储器110用于存储程序或者数据。所述存储器110可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。
处理器120用于读/写存储器110中存储的数据或程序,并执行相应地功能。比如,存储器110中存储有图像处理装置200,所述图像处理装置200包括至少一个可以软件或固件(firmware)的形式存储于所述存储器110中的软件功能模块。所述处理器120通过运行存储在存储器110内的软件程序以及模块,如本申请实施例中的图像处理装置200,从而执行各种功能应用以及数据处理,即实现本申请实施例中的图像处理方法。
通信单元130用于通过网络建立所述电子设备100与其它通信终端之间的通信连接,并用于通过所述网络收发数据。
应当理解的是,图1所示的结构仅为电子设备100的结构示意图,所述电子设备100还可包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。图1中所示的各组件可以采用硬件、软件或其组合实现。
请参照图2,图2是本申请实施例提供的图像处理方法的流程示意图。所述方法可应用于上述电子设备100。下面对图像处理方法的具体流程进行详细阐述。
步骤S110,获得待处理的多通道特征图。
步骤S120,通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图。
其中,非逐点卷积用于描述各通道的空间特征和特征图之间的信息交换。
步骤S130,对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道第三输出特征图。
在本实施例中,在得到待处理的多通道特征图后,采用并行的逐点卷积运算和非逐点运算对该多通道特征图进行处理,从而得到多通道第一输出特征图和多通道第二输出特征图。其中,所述多通道第一输出特征图是通过逐点卷积运算得到的,所述多通道第二输出特征图是通过非逐点卷积运算得到的。最后将得到的所述多通道第一输出特征图及多通道第二输出特征图进行融合,得到多通道第三输出特征图。
在得到多通道第三输出特征图的过程中,利用逐点卷积来描述每个点的特征以及输入通道之间的信息交换,利用非逐点卷积来描述每个输入通道的空间特征和几个输入的特征图之间的信息交换。也即,获得的多通道第三输出特征图,既涉及了多通道特征图中各通道的空间特征和特征图之间的信息交换,也涉及了多通道特征图中特征图每个点的特征以及通道之间的信息交换。同时,上述两种运算处理为并行的。由此,得到的多通道第三输出特征图包含了更多的特征信息,从而提高了作为输出的特征图的特征表达能力,避免由于仅考虑某一方面信息导致出现有效信息未被提取作为特征的情况;同时不会延长得到多通道第三输出特征图的时间。
例如,若只对待处理的多通道特征图进行了非逐点卷积处理,并将处理结果作为多通道第三输出特征图,且该非逐点卷积用于描述每个输入通道的空间特征和几个输入的特征图之间的信息交换,那么该处理方式就忽略了多通道特征图中每个通道的特征图中每个点的特征以及输入通道之间的信息交换。然而,被忽略的被认为是冗余信息的信息可以对待处理的多通道特征图有更好的理解。因此,通过本方案可以提高多通道第三输出特征图对待处理的多通道特征图的特征表达能力。
可选地,在本实施例中,可以通过接收其他设备发送的多通道特征图或对输入图像进行处理等方式获得多通道特征图。
请参照图3,图3是本申请实施例提供的对多通道特征图进行处理的示意图之一。其 中,图3中的PWC表示逐点卷积,非PWC表示非逐点卷积。在本实施例的一种实施方式中,可通过如下方式获得所述多通道第一输出特征图、多通道第二输出特征图以及多通道第三输出特征图。
在得到所述多通道特征图后,对所述多通道特征图进行逐点卷积运算,得到所述多通道第一输出特征图,对所述多通道特征图进行非逐点卷积运算,得到所述多通道第二输出特征图。也即,逐点卷积及非逐点卷积处理的对象完全相同。然后,对所述多通道第一输出特征图及多通道第二输出特征图进行累加运算,也即进行“Add”处理,从而得到所述多通道第三输出特征图。
其中,上述的累加运算,是指将多个输出特征图针对相同像素点进行加法运算。比如,多通道第一输出特征图中的特征图A1对应了所述多通道特征图中的特征图1~3,多通道第二输出特征图中的特征图B1对应了所述多通道特征图中的特征图1~3,则将特征图A1与特征图B1针对相同像素点进行累加,以实现特征图A1与特征图B1的融合,融合得到的特征图即为一个通道的第三输出特征图。
请参照图4,图4是本申请实施例提供的对多通道特征图进行处理的示意图之二。其中,图4中的Split表示分组,Shuffle表示重组。在本实施例的另一种实施方式中,可通过如下方式获得所述多通道第一输出特征图、多通道第二输出特征图以及多通道第三输出特征图。
所述多通道特征图在通道维度分为第一部分多通道特征图及第二部分多通道特征图。可选地,第一部分多通道特征图和第二部分多通道特征图各自包括的特征图数量可以相同,也可以不同,也即,所述多通道特征图在通道维度的划分方式可以是平均分,也可以是不平均分。若为平均分,则意味将所述多通道特征图在通道维度平均分为第一部分多通道特征图和第二部分多通道特征图。
可选地,作为一种实施方式,在获得所述多通道特征图时,该多通道特征图就是分为两部分的第一部分多通道特征图及第二部分多通道特征图。作为另一种实施方式,也可以是在获得所述多通道特征图后,在进行逐点卷积运算和非逐点运算之前,将所述多通道特征图分为两部分,进而得到所述第一部分多通道特征图和第二部分多通道特征图。可选地,可以通过预先设置的通道分离算子将所述多通道特征图分为所述第一部分多通道特征图及第二部分多通道特征图。
对所述第一部分多通道特征图进行逐点卷积运算,得到所述多通道第一输出特征图;对所述第二部分多通道特征图进行非逐点卷积运算,得到所述多通道第二输出特征图;也即,逐点卷积及非逐点卷积处理的对象不相同。然后,对所述多通道第一输出特征图及多通道第二输出特征图进行通道重组运算,得到所述多通道第三输出特征图。
其中,在获得所述多通道第一输出特征图及多通道第二输出特征图后,可先将所述多通道第一输出特征图及多通道第二输出特征图以“Concat”方式进行通道合并,然后,再进行“Shuffle”处理,即进行通道混洗处理,从而得到所述多通道第三输出特征图。
可选地,在本实施例中,非逐点卷积运算可以为常规的卷积运算中的深度卷积运算、组卷积运算、扩展卷积运算、反卷积运算以及卷积核大小不为1乘1的标准卷积运算中的任意一种。
可选地,在本实施例中,所述非逐点卷积运算还可以为自适应线性运算。在采用自适应线性运算时,可通过如下方式获得所述多通道第二输出特征图:根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所 述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图。
其中,待进行非逐点卷积运算的所述多个通道特征图由非逐点卷积运算针对的对象确定。如图3所示,在逐点卷积及非逐点卷积处理的对象完全相同时,所述多个通道特征图即为待处理的所述多通道特征图。如图4所示,在逐点卷积及非逐点卷积处理的对象不相同时,所述多个通道特征图即为所述第二部分多通道特征图。
在进行自适应线性运算时,可先获得第一比值及第二比值,然后判断第一比值或第二比值是否为正整数,并根据为正整数的比值,在通道维度对所述多个通道特征图进行分组。其中,所述第一比值为所述多个通道特征图的通道数与所述多通道第二输出特征图的通道数之比,所述第二比值为所述多通道第二输出特征图的通道数与所述多个通道特征图的通道数之比。
在本实施例的一种实施方式中,若所述第一比值为正整数,可按照所述第一比值在通道维度将所述多个通道特征图平均分为多个特征图组。其中,每个特征图组中特征图的数量为所述第一比值。然后,针对每个特征图组,对该特征图组中各通道对应的特征图分别进行线性运算,并将该特征图组的线性运算结果的累加和作为该特征图组对应的一第二输出特征图。
该计算方式可用如下公式表示:
Y [i]=Φ i(X [(iα):(iα+α-1)]),if C i/C o=α      (1)
其中,α=1,2,...,N 1,i=1,2,...,N 2,X表示待进行非逐点卷积运算的多个通道特征图,X [i]表示待处理的多通道特征图中第i个通道的特征图,Y表示第二输出特征图,Y [i]表示第i个通道的第二输出特征图,Φ i表示各通道第二输出特征图对应的线性运算,C i表示待处理的多通道特征图的通道数,C o表示多通道第二输出特征图的通道数。对上述计算公式而言,可以理解为X表示输入特征图,Y表示输出特征图,C i表示输入通道数,C o表示输出通道数。
当所述第一比值为正整数时,即所述多个通道特征图的通道数与所述多通道第二输出特征图的通道数之比为正整数时,基于Φ i对所述多个通道特征图中的α个通道特征图分别进行线性运算,并将线性运算结果的和作为第i个通道的第二输出特征图Y [i]。其中,第二输出特征图Y [i]是通过对所述多个通道特征图中的α个通道特征图的处理得到的。
可选地,上述线性运算可通过仿射变换、小波变换等方法实现。此外,卷积是一种高效的操作,已经受到很好地硬件支持,可以覆盖很多广泛使用的线性操作,因此也可以使用深度可分离卷积来实现上述线性运算,也即,上述自适应线性运算可以为自适应卷积运算(Adaptive Convolution,AC)。
下面以通过深度可分离卷积实现线性运算为例,对上述实施方式进行举例说明。
假设所述多个通道特征图为X 1~X 12,所述多通道第二输出特征图为Y 1~Y 6,有6个卷积核Φ 1~Φ 6。所述多个通道特征图的通道数与多通道第二输出特征图的通道数之比为2。
按照2在通道维度将多个通道特征图划分为6个特征图组:X 1、X 2;X 3、X 4;X 5、X 6;X 7、X 8;X 9、X 10;X 11、X 12。每个特征图组中包括2个通道特征图。
特征图X 1、X 2分别基于卷积核Φ 1计算后,可得到两个计算结果图,将两个计算结果图的累计加作为特征图X 1、X 2对应的第二输出特征图Y 1。根据特征图X 3、X 4以及卷积核Φ 2,可以相同方式计算得到特征图X 3、X 4对应的第二输出特征图Y 2。以此类推,可再计算得到第二输出特征图Y 3~Y 6。由此,可得到多通道第二输出特征图Y 1~Y 6
在本实施例的另一种实施方式中,若所述第二比值为正整数,可在通道维度将所述多个通道特征图平均分为多个特征图组。其中,每个特征图组中特征图的数量为1。针对各特征图组,对该特征图组中的特征图进行多次线性运算,并将该特征图对应的多个线性运算结果作为该特征图对应的多个通道的第二输出特征图。其中,对一个特征图组中的特征图进行线性运算的次数为所述第二比值。
该计算方式可用如下公式表示:
Y [(iα):(iα+α-1)]=Φ i(X [i]),if C o/C i=α        (2)
关于该公式的字符说明可参照对公式(1)中的字符说明。
当所述第二比值为正整数时,即所述多通道第二输出特征图的通道数与所述多个通道特征图的通道数之比为正整数时,基于Φ i对所述多个通道特征图中的每一个通道特征图 分别进行α次线性运算,从而针对每个通道特征图生成α个第二输出特征图。
可选地,上述线性运算可通过仿射变换、小波变换等方法实现。此外,也可以使用深度可分离卷积来实现上述线性运算。
对于标准的卷积层,假设输入特征图的大小为H i×W i×C i,其中,H i表示输入特征图的高度,W i表示输入特征图的宽度,C i表示输入通道的数量。假设输出通道特征图的大小为H o×W o×C o,其中,H o表示输出特征图的高度,W o表示输出特征图的宽度,C o表示输出通道的数量。这样,每一个卷积核的大小为K×K×C i。因此,总计算开销为:
FLOPs(Standard)=H o×W o×K×K×C i×C o       (3)
深度卷积和组卷积等卷积运算可有效地减少FLOPs。其中,组卷积是对输入特征图进行分组,然后分别对每组进行卷积。但是,这种卷积需要设置一个分组参数G。如果不同的卷积层使用相同的参数G,对于具有较多通道的特征图,每个组中会有更多的通道,这可能会导致每组的卷积效率更低。当组的个数等于输入通道数、并且输出通道数和输入通道时,组卷积就变成了深度卷积,虽然进一步减少了FLOPs,但是,由于深度卷积不能改变通道的数量,所以会跟随一个点卷积,这样会增加延迟。
而本方案中上述采用第一比值及第二比值得到所述多通道第二输出特征图的方式,不需要像普通的分组卷积那样需要人为手动设置每一层的参数,也不需要像深度卷积一样约束输入通道数和输出通道数相同。本方案的上述方式,可以自适应地从一个特征图生成多个特征图,也可以自适应地将多个特征图合并成一个特征图。这样,C i个输入特征图就可以自适应的生成C o个输出特征图(如图5所示)。自适应卷积的计算量为:
H o×W o×K×K×C i×C o/min(C i,C o)     (4)
由此可知,本方案中采用第一比值及第二比值得到所述多通道第二输出特征图的方式,具有减少计算量的效果。
在本实施例的另一种实施方式中,若第一比值及第二比值均不为正整数,即所述第一比值为所述多个通道特征图的通道数与所述多通道第二输出特征图的通道数之比、所述第二比值为所述多通道第二输出特征图的通道数与所述多个通道特征图的通道数之比,均不为正整数,还可以按照以下方式通过对所述多个通道特征图进行分组,然后计算得到所述多通道第二输出特征图。
获得所述多个通道特征图的通道数和所述多通道第二输出特征图的通道数的公约数。然后,将其中一个公约数作为目标公约数。可选地,可以任意一个公约数作为目标公约数, 或者选择最大公约数作为目标公约数,当然可以以其他方式选出目标公约数。在本实施例的一种方式中,目标公约数为最大公约数。
根据所述目标公约数在通道维度对所述多个通道特征图及所述多通道第二输出特征图进行分组。其中,每个特征图组中特征图的数量为多个,每个第二输出特征图组中第二输出特征图的数量为多个。组的数量为所述目标公约数。针对每个特征图组,对该特征图组分别进行多次线性运算,得到该特征图组对应的第二输出特征图组。
比如,目标公约数为b,则将所述多个通道特征图及所述多通道第二输出特征图进行分成b组。根据一个特征图组得到该特征图组对应的第二输出特征图组的方式与组卷积中分组后的计算方式类似。比如,利用卷积运算实现上述线性运算,一个特征图组中包括4个特征图,一个第二输出特征图组中包括3个第二输出特征图,该特征图组对应的卷积核组中包括了卷积核1、2、3,在获得该特征图组对应的第二输出特征图组时,可通过如下方式实现:根据卷积核1及该特征图组中的4个特征图,得到一个第二输出特征图;根据卷积核2及该特征图组中的4个特征图,再次得到一个第二输出特征图;根据卷积核3及该特征图组中的4个特征图,计算得到第三个第二输出特征图。
自适应线性运算可以很好地描述组内的信息交换,能够有效地减少参数量和FLOPs。由于特征图中的冗余信息可以对输入数据有更好的理解,因此本申请实施例还通过逐点卷积保留特征图中的一些冗余信息,从而避免当仅使用自适应线性运算时会出现的缺乏组间信息通信的情况。由此,本申请实施例通过采用并行的逐点卷积运算及非逐点卷积运算对待处理的多通道特征图进行处理,从而提高得到的多通道第三输出特征图的特征表达能力,同时只使用了少量的FLOPs和参数。
其中,标准卷积与本申请中采用式(4)得到多通道第三输出特征图的计算量的比为:
Figure PCTCN2020140923-appb-000001
当K<<min(C i,C o)时,标准卷积与本申请中采用公式(4)得到多通道第三输出特征图的计算量的比还可以表示为:
Figure PCTCN2020140923-appb-000002
由此可知,本申请实施例减少了计算量及参数。
当本申请实施例是在第一部分多通道特征图的通道与第二部分多通道特征图的通道数相同、且K<<min(C i,C o)的情况下,通过公式(4)对应的自适应卷积以及通道重组获得多通道第三输出特征图时,标准卷积与本申请实施例的计算量比为:
Figure PCTCN2020140923-appb-000003
在非逐点运算为式(4)对应的自适应卷积、且K<<min(C i,C o)时,未通过通道重组得到多通道第三输出特征图的计算量与在通道等分并通过通道重组得到多通道第三输出特征图的计算量比为:
Figure PCTCN2020140923-appb-000004
由公式(8)可知,通过通道重组对应的方式得到多通道第三输出特征图,可加快得到多通道第三输出特征图的速度。对比图3和图4可知,将“Add”操作替换为“Split”、“Concat”和“Shuffle”,可以进一步减少FLOPs,并保持通道之间的信息交换。
为了执行上述实施例及各个可能的方式中的相应步骤,下面给出一种图像处理装置200的实现方式,可选地,该图像处理装置200可以采用上述图1所示的电子设备100的器件结构。进一步地,请参照图6,图6是本申请实施例提供的图像处理装置200的方框示意图。需要说明的是,本实施例所提供的图像处理装置200,其基本原理及产生的技术效果和上述实施例相同,为简要描述,本实施例部分未提及之处,可参考上述的实施例中相应内容。所述图像处理装置200可应用于电子设备100。所述图像处理装置200可以方便地代替标准的卷积单元,而不需要改变网络架构。也即,可以现有神经网络结构中的标准卷积或其他卷积替换为所述图像处理装置200。该图像处理装置200具有即插即用的特点。该图像处理装置200可以包括获得子单元210、处理子单元220及融合子单元230。
所述获得子单元210,用于获得待处理的多通道特征图。
所述处理子单元220,用于通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图。其中,非逐点卷积用于描述通道的空间特征和特征图之间的信息交换。
所述融合子单元230,用于对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道输出特征图。
可选地,上述子单元可以软件或固件(Firmware)的形式存储于图1所示的存储器110中或固化于该电子设备100的操作系统(Operating System,OS)中,并可由图1中的处理器120执行。同时,执行上述子单元所需的数据、程序的代码等可以存储在存储器110中。
现有的图像处理系统中一般都会使用模型对图像进行处理。在实际应用中部署模型时,需要考虑模型的参数、浮点运算以及延迟。但是模型的压缩不可避免地会导致精度的损失。因此,如何平衡模型的精度和效率是一个迫切需要解决的问题。
针对该问题,一些量化、剪枝等方法可以有效解决。基于量化的方法可以减少计算中的冗余。然而,在推理过程中,需要将共享的权重恢复到原来的位置,因此无法节约运行内存。基于剪枝的方法可以减少预训练模型中的冗余连接,然而,找到最佳性能需要大量 的工作。
还有一些方法使用高效的卷积运算来设计一种高效的结构。AlexNet首先提出了组卷积,它将模型分布在两个GPU(Graphics Processing Unit,图形处理器)上。深度卷积作为一种特殊的形式,在Xception中首先提出,并在MobileNet中得到了很好的应用。深度卷积需要相同的输入和输出通道,因此1×1卷积通常在深度卷积层之前或之后添加。在这种情况下,不可避免地要将一个卷积层分解为两个或三个串联的层。例如,MobileNet将每一层分成两层,首先使用深度卷积,然后使用点卷积。该方式虽然减少了参数和FLOPs,但是会产生延迟。
针对以上情况,本申请实施例还提供一种图像处理系统,所述图像处理系统中包括网络模型,所述网络模型中包括所述的图像处理装置200。所述图像处理装置200可以自适应地获得期望大小的输出特征图,并提高输出特征图的特征表达能力,同时不会增加延迟,还具有能够减少FLOPs和参数量的特点。由此,可提高图像处理系统的处理速度,并保证处理结果的质量。其中,所述网络模型可以是,但不限于,姿态估计模型、其他模型。
在本实施例的一种实施方式中,所述网络模型为姿态估计模型,所述姿态估计模型包括依次连接的预处理网络、主体网络以及回归器。所述预处理网络,用于对输入图像进行预处理,得到输入特征图。所述主体网络,用于对输入特征图进行处理,得到输出特征图,其中,所述主体网络包括基本单元、交换单元以及残差单元,所述交换单元中输入输出特征图分辨率相同部分对应的卷积单元、基本单元中的卷积单元、残差单元中两个大小相同的卷积单元之间的卷积单元中的至少一个卷积单元为所述图像处理装置200。所述回归器,用于将所述输出特征图转换为位置热图,并根据所述位置热图确定关节点的位置。
可选地,所述预处理网络可以是由用于降低分辨率的两个卷积单元的组成的stem,该预处理网络可以将分辨率降低到输入图像的1/4。所述主体网络的输入特征图和输出特征图分辨率相同。通过采用本实施例中的图像处理装置200,可实现对主体网络的轻量化。
可选地,所述主体网络可以是HRNet。下面对HRNet进行简要介绍。HRNet从第一阶段的高分辨率分支开始,在整个过程中均保持高分辨率表示,并在接下来的每个阶段,以当前分支中最低分辨率的1/2并行地向当前分支添加一个新分支。
如图7所示,HRNet共包含四个阶段(即图7网络中有背景色且连接的四个区域)。第一个阶段包括4个残差单元和1个交换单元,每个残差单元由一个宽度为64的bottleneck跟随一个3×3的卷积单元组成;交换单元会生成两个分支,一个分支将特征图的宽度减少到32,另一个分支将分辨率降为原来的1/2。
第二、三、四阶段,每个阶段由4个平行的基本单元和一个交换单元组成。第二、三、四阶段分别包含1、4、3个多分辨率块。第二、三、四阶段的多分辨率块分别作为二、三、四分辨率块。为了描述简便,图7中每个节点都只有一个多分辨率块。以四分辨率块为例,一个四分辨率块包含4个分支,每个分支包含4个基本单元(如图8中的a所示),每个基本单元包含两个3×3的卷积单元。另外,一个跨分辨率的交换单元(如图8中的b所示)完成四个分支的信息交互。
可选地,在上述姿态估计模型中,HRNet主体网络中第一阶段的残差单元、第二至四阶段的基本单元以及交换单元中的相关卷积单元可以均为所述图像处理装置200。
对于交换单元,如图9中的a所示,本申请实施例将输入输出特征图分辨率相同部分的卷积单元(即图9中的a中深灰色部分)设置为所述图像处理装置200,由此使得姿态估计模型中包括轻量化的交换单元。对于基本单元,将基本单元中的两个卷积单元(即图9 中的b中深灰色部分)设置为所述图像处理装置200,由此使得姿态估计模型中包括轻量化的基本单元。对于残差单元,将残差单元中两个大小相同的卷积单元之间的卷积单元(即图9中的c中深灰色部分)设置为所述图像处理装置200,由此使得姿态估计模型中包括轻量化的残差单元。
本申请实施例中的姿态网络模型是轻量级模型。该姿态网络模型可以提高姿态估计的效率,并获得具有可比性的精度。
本申请实施例还提供一种可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现所述的图像处理方法。
综上所述,本申请实施例提供了一种图像处理方法、装置、系统、电子设备及可读存储介质,在获得待处理的多通道特征图后,通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图;然后对得到的多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道第三输出特征图。其中,非逐点卷积用于描述各通道的空间特征和特征图之间的信息交换。由此,使得获得的多通道第三输出特征图,既涉及了多通道特征图中各通道的空间特征和特征图之间的信息交换,也涉及了多通道特征图中特征图每个点的特征以及通道之间的信息交换,因而使得该多通道第三输出特征图包含了更多的特征信息,提高了作为输出的特征图的特征表达能力;同时通过采用并行计算,可避免延长得到多通道第三输出特征图的时间。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (10)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获得待处理的多通道特征图;
    通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图,其中,非逐点卷积用于描述各通道的空间特征和特征图之间的信息交换;
    对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道第三输出特征图;
    其中,非逐点卷积运算为自适应线性运算,获得所述多通道第二输出特征图,包括:
    根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图;
    若第一比值为正整数,其中,所述第一比值为所述多个通道特征图的通道数与所述多通道第二输出特征图的通道数之比,所述根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图,包括:
    按照所述第一比值在通道维度将所述多个通道特征图平均分为多个特征图组,其中,每个特征图组中特征图的数量为所述第一比值;
    针对每个特征图组,对该特征图组中各通道对应的特征图分别进行线性运算,并将该特征图组的线性运算结果的累加和作为该特征图组对应的一第二输出特征图。
  2. 根据权利要求1所述的方法,其特征在于,
    所述通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图,包括:
    对所述多通道特征图进行逐点卷积运算,得到所述多通道第一输出特征图;
    对所述多通道特征图进行非逐点卷积运算,得到所述多通道第二输出特征图;
    所述对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道输出特征图,包括:
    对所述多通道第一输出特征图及多通道第二输出特征图进行累加运算,得到所述多通道输出特征图。
  3. 根据权利要求1所述的方法,其特征在于,所述多通道特征图在通道维度分为第一部分多通道特征图及第二部分多通道特征图,所述通过并行的逐点卷积运算和非逐点运算对所述权利要求书2多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图,包括:
    对所述第一部分多通道特征图进行逐点卷积运算,得到所述多通道第一输出特征图;
    对所述第二部分多通道特征图进行非逐点卷积运算,得到所述多通道第二输出特征图;所述对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道输出特征图,包括:
    对所述多通道第一输出特征图及多通道第二输出特征图进行通道重组运算,得到所述多通道第三输出特征图。
  4. 根据权利要求1所述的方法,其特征在于,若第二比值为正整数,所述第二比值为所述多通道第二输出特征图的通道数与所述多个通道特征图的通道数之比,所述根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图,包括:
    在通道维度将所述多个通道特征图平均分为多个特征图组,其中,每个特征图组中特征图的数量为1;
    针对各特征图组,对该特征图组中的特征图进行多次线性运算,并将该特征图对应的多个线性运算结果作为该特征图对应的多个通道的第二输出特征图,其中,对一个特征图组中的特征图进行线性运算的次数为所述第二比值。
  5. 根据权利要求1所述的方法,其特征在于,若第一比值及第二比值均不为正整数,其中,所述第一比值为所述多个通道特征图的通道数与所述多通道第二输出特征图的通道数之比,所述第二比值为所述多通道第二输出特征图的通道数与所述多个通道特征图的通道数之比,所述根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图,包括:
    获得所述多个通道特征图的通道数和所述多通道第二输出特征图的通道数的公约数;将其中一个公约数作为目标公约数;
    根据所述目标公约数在通道维度对所述多个通道特征图及所述多通道第二输出特征图进行分组,其中,每个特征图组中特征图的数量为多个,每个第二输出特征图组中第二输出特征图的数量为多个,组的数量为所述目标公约数;
    针对每个特征图组,对该特征图组分别进行多次线性运算,得到该特征图组对应的第二输出特征图组。
  6. 一种图像处理装置,其特征在于,所述图像处理装置包括:
    获得子单元,用于获得待处理的多通道特征图;
    处理子单元,用于通过并行的逐点卷积运算和非逐点运算对所述多通道特征图进行处理,获得多通道第一输出特征图和多通道第二输出特征图,其中,非逐点卷积用于描述通道的空间特征和特征图之间的信息交换;
    融合子单元,用于对所述多通道第一输出特征图及多通道第二输出特征图进行融合处理,得到多通道输出特征图;
    其中,非逐点卷积运算为自适应线性运算,所述处理子单元具体用于:
    根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图;
    若第一比值为正整数,其中,所述第一比值为所述多个通道特征图的通道数与所述多通道第二输出特征图的通道数之比,所述处理子单元根据待进行非逐点卷积运算的多个通道特征图的通道数及所述多通道第二输出特征图的通道数,在通道维度对所述多个通道特 征图进行分组,并对每个特征图组进行线性运算,得到所述多通道第二输出特征图的方式,包括:
    按照所述第一比值在通道维度将所述多个通道特征图平均分为多个特征图组,其中,每个特征图组中特征图的数量为所述第一比值;
    针对每个特征图组,对该特征图组中各通道对应的特征图分别进行线性运算,并将该特征图组的线性运算结果的累加和作为该特征图组对应的一第二输出特征图。
  7. 一种图像处理系统,其特征在于,所述图像处理系统包括网络模型,所述网络模型中包括权利要求6所述的图像处理装置。
  8. 根据权利要求7所述的系统,其特征在于,所述网络模型为姿态估计模型,所述姿态估计模型包括依次连接的预处理网络、主体网络以及回归器,
    所述预处理网络,用于对输入图像进行预处理,得到输入特征图;
    所述主体网络,用于对输入特征图进行处理,得到输出特征图,其中,所述主体网络包括基本单元、交换单元以及残差单元,所述交换单元中输入输出特征图分辨率相同部分对应的卷积单元、基本单元中的卷积单元、残差单元中两个大小相同的卷积单元之间的卷积单元中的至少一个卷积单元为所述图像处理装置;
    所述回归器,用于将所述输出特征图转换为位置热图,并根据所述位置热图确定关节点的位置。
  9. 一种电子设备,其特征在于,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器可执行所述机器可执行指令以实现权利要求1-5中任意一项所述的图像处理方法。
  10. 一种可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-5中任意一项所述的图像处理方法。
PCT/CN2020/140923 2020-08-05 2020-12-29 图像处理方法、装置、系统、电子设备及可读存储介质 WO2022027917A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/388,043 US20220044370A1 (en) 2020-08-05 2021-07-29 Image processing methods

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010775325.2 2020-08-05
CN202010775325.2A CN111652330B (zh) 2020-08-05 2020-08-05 图像处理方法、装置、系统、电子设备及可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/388,043 Continuation US20220044370A1 (en) 2020-08-05 2021-07-29 Image processing methods

Publications (1)

Publication Number Publication Date
WO2022027917A1 true WO2022027917A1 (zh) 2022-02-10

Family

ID=72352594

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/140923 WO2022027917A1 (zh) 2020-08-05 2020-12-29 图像处理方法、装置、系统、电子设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN111652330B (zh)
WO (1) WO2022027917A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912888A (zh) * 2023-09-12 2023-10-20 深圳须弥云图空间科技有限公司 对象识别方法、装置、电子设备及存储介质
CN117115641A (zh) * 2023-07-20 2023-11-24 中国科学院空天信息创新研究院 建筑物信息提取方法、装置、电子设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111652330B (zh) * 2020-08-05 2020-11-13 深圳市优必选科技股份有限公司 图像处理方法、装置、系统、电子设备及可读存储介质
CN112200090B (zh) * 2020-10-12 2022-07-01 桂林电子科技大学 基于交叉分组空谱特征增强网络的高光谱图像分类方法
CN112101318A (zh) * 2020-11-17 2020-12-18 深圳市优必选科技股份有限公司 基于神经网络模型的图像处理方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106485192A (zh) * 2015-09-02 2017-03-08 富士通株式会社 用于图像识别的神经网络的训练方法和装置
CN108304920A (zh) * 2018-02-02 2018-07-20 湖北工业大学 一种基于MobileNets优化多尺度学习网络的方法
CN109903221A (zh) * 2018-04-04 2019-06-18 华为技术有限公司 图像超分方法及装置
US10387753B1 (en) * 2019-01-23 2019-08-20 StradVision, Inc. Learning method and learning device for convolutional neural network using 1×1 convolution for image recognition to be used for hardware optimization, and testing method and testing device using the same
CN111652330A (zh) * 2020-08-05 2020-09-11 深圳市优必选科技股份有限公司 图像处理方法、装置、系统、电子设备及可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103456014A (zh) * 2013-09-04 2013-12-18 西北工业大学 一种基于多特征整合视觉注意模型的景象匹配适配性分析方法
US20200090030A1 (en) * 2018-09-19 2020-03-19 British Cayman Islands Intelligo Technology Inc. Integrated circuit for convolution calculation in deep neural network and method thereof
CN110321932B (zh) * 2019-06-10 2021-06-25 浙江大学 一种基于深度多源数据融合的全城市空气质量指数估计方法
CN110378943A (zh) * 2019-06-21 2019-10-25 北京达佳互联信息技术有限公司 图像处理方法、装置、电子设备及存储介质
CN110309836B (zh) * 2019-07-01 2021-05-18 北京地平线机器人技术研发有限公司 图像特征提取方法、装置、存储介质和设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106485192A (zh) * 2015-09-02 2017-03-08 富士通株式会社 用于图像识别的神经网络的训练方法和装置
CN108304920A (zh) * 2018-02-02 2018-07-20 湖北工业大学 一种基于MobileNets优化多尺度学习网络的方法
CN109903221A (zh) * 2018-04-04 2019-06-18 华为技术有限公司 图像超分方法及装置
US10387753B1 (en) * 2019-01-23 2019-08-20 StradVision, Inc. Learning method and learning device for convolutional neural network using 1×1 convolution for image recognition to be used for hardware optimization, and testing method and testing device using the same
CN111652330A (zh) * 2020-08-05 2020-09-11 深圳市优必选科技股份有限公司 图像处理方法、装置、系统、电子设备及可读存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115641A (zh) * 2023-07-20 2023-11-24 中国科学院空天信息创新研究院 建筑物信息提取方法、装置、电子设备及存储介质
CN117115641B (zh) * 2023-07-20 2024-03-22 中国科学院空天信息创新研究院 建筑物信息提取方法、装置、电子设备及存储介质
CN116912888A (zh) * 2023-09-12 2023-10-20 深圳须弥云图空间科技有限公司 对象识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN111652330A (zh) 2020-09-11
CN111652330B (zh) 2020-11-13

Similar Documents

Publication Publication Date Title
WO2022027917A1 (zh) 图像处理方法、装置、系统、电子设备及可读存储介质
CN108229479B (zh) 语义分割模型的训练方法和装置、电子设备、存储介质
WO2019184657A1 (zh) 图像识别方法、装置、电子设备及存储介质
CN109584168B (zh) 图像处理方法和装置、电子设备和计算机存储介质
WO2023035531A1 (zh) 文本图像超分辨率重建方法及其相关设备
WO2022001086A1 (zh) 一种高效的gpu资源分配优化方法和系统
CN113435682A (zh) 分布式训练的梯度压缩
CN115147265B (zh) 虚拟形象生成方法、装置、电子设备和存储介质
CN111383232A (zh) 抠图方法、装置、终端设备及计算机可读存储介质
WO2023077809A1 (zh) 神经网络训练的方法、电子设备及计算机存储介质
CN111340077A (zh) 基于注意力机制的视差图获取方法和装置
WO2021147276A1 (zh) 数据处理方法、装置及芯片、电子设备、存储介质
CN113393468A (zh) 图像处理方法、模型训练方法、装置和电子设备
CN112783807A (zh) 一种模型计算方法及系统
CN108898557B (zh) 图像恢复方法及装置、电子设备、计算机程序及存储介质
CN116468632A (zh) 一种基于自适应特征保持的网格去噪方法及装置
JP2019504430A (ja) 画像処理方法及びデバイス
CN113274735B (zh) 模型处理方法、装置、电子设备及计算机可读存储介质
CN112784967B (zh) 信息处理方法、装置以及电子设备
CN115082356A (zh) 基于shader的视频流图像校正方法、装置和设备
CN116862762A (zh) 一种视频超分方法、装置、设备及存储介质
CN113870428A (zh) 场景地图生成方法、相关装置及计算机程序产品
US20220044370A1 (en) Image processing methods
CN112258386A (zh) 图像变形加速处理方法及装置、电子设备和可读存储介质
CN112288829A (zh) 一种针对图像复原卷积神经网络的压缩方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20948743

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20948743

Country of ref document: EP

Kind code of ref document: A1