CN112949651A

CN112949651A - Feature extraction method and device, storage medium and electronic equipment

Info

Publication number: CN112949651A
Application number: CN202110129778.2A
Authority: CN
Inventors: 刘钰安
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-06-11

Abstract

The embodiment of the application discloses a feature extraction method, a feature extraction device, a storage medium and electronic equipment, wherein a first feature coding module and a second feature coding module are used for respectively extracting a low-level feature map and a high-level feature map of an image to be processed, a feature aggregation module is used for respectively sampling the low-level feature map and the high-level feature map to the resolution of the other side for fusion processing to obtain a first fusion feature map and a second fusion feature map, and then the first fusion feature map and the second fusion feature map are sampled to the same resolution for fusion processing to obtain a target fusion feature map of the image to be processed. According to the scheme, the low-level features and the high-level features are extracted respectively, and feature fusion is performed on the low-level features and the high-level features, so that the feature extraction effect can be improved, and the accuracy of image segmentation is improved.

Description

Feature extraction method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of feature extraction technologies, and in particular, to a feature extraction method and apparatus, a storage medium, and an electronic device.

Background

With the development of the technology, the image segmentation technology is widely applied to the fields of photography and shooting, movie and television production, video monitoring, image compression and the like. The image segmentation technology is a technology and a process for dividing an image into a plurality of specific areas with unique properties based on image features and extracting an interested image area.

In the related art, the accuracy of extracting the image features is low when the image is segmented, so that the accuracy of segmenting the image is low.

Disclosure of Invention

The embodiment of the application provides a feature extraction method, a feature extraction device, a storage medium and electronic equipment, which can improve the accuracy of extracting image features when segmenting a portrait of an image and a video, and further correspondingly improve the accuracy of segmenting the portrait.

In a first aspect, an embodiment of the present application provides a feature extraction method, including:

extracting a low-level feature map of an image to be processed through a first feature coding module, and extracting a high-level feature map of the image to be processed through a second feature coding module;

performing first processing on the low-level feature map through a feature aggregation module to obtain a first feature map with the same resolution as that of the low-level feature map, and performing second processing on the high-level feature map through the feature aggregation module to obtain a second feature map with the same resolution as that of the low-level feature map;

fusing the first feature map and the second feature map through the feature aggregation module to obtain a first fused feature map;

performing third processing on the low-level feature map through the feature aggregation module to obtain a third feature map with the same resolution as that of the high-level feature map, and performing fourth processing on the high-level feature map through the feature aggregation module to obtain a fourth feature map with the same resolution as that of the high-level feature map;

fusing the third feature map and the fourth feature map through the feature aggregation module to obtain a second fused feature map;

and sampling the first fusion feature map and the second fusion feature map to the same resolution through the feature aggregation module, and then fusing to obtain a target fusion feature map corresponding to the image to be processed.

In a second aspect, an embodiment of the present application further provides a feature extraction apparatus, including:

the feature extraction module is used for extracting a low-level feature map of the image to be processed through the first feature coding module and extracting a high-level feature map of the image to be processed through the second feature coding module;

the first processing module is used for performing first processing on the low-level feature map through the feature aggregation module to obtain a first feature map with the same resolution as that of the low-level feature map, and performing second processing on the high-level feature map through the feature aggregation module to obtain a second feature map with the same resolution as that of the low-level feature map;

the first feature fusion module is used for fusing the first feature map and the second feature map through the feature aggregation module to obtain a first fused feature map;

the second processing module is used for performing third processing on the low-level feature map through the feature aggregation module to obtain a third feature map with the same resolution as that of the high-level feature map, and performing fourth processing on the high-level feature map through the feature aggregation module to obtain a fourth feature map with the same resolution as that of the high-level feature map;

the second feature fusion module is used for fusing the third feature map and the fourth feature map through the feature aggregation module to obtain a second fused feature map;

and the third feature fusion module is used for sampling the first fusion feature map and the second fusion feature map to the same resolution through the feature aggregation module and then fusing the first fusion feature map and the second fusion feature map to obtain a target fusion feature map corresponding to the image to be processed.

In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the feature extraction method provided in any embodiment of the present application.

In a fourth aspect, an embodiment of the present application further provides an electronic device, including a processor and a memory, where the memory has a computer program, and the processor is configured to execute the feature extraction method provided in any embodiment of the present application by calling the computer program.

According to the technical scheme provided by the embodiment of the application, a first feature coding module is used for extracting a low-level feature map of an image to be processed, and a second feature coding module is used for extracting a high-level feature map of the image to be processed; performing first processing on the low-level feature map through a feature aggregation module to obtain a first feature map with the same resolution as that of the low-level feature map, and performing second processing on the high-level feature map through the feature aggregation module to obtain a second feature map with the same resolution as that of the low-level feature map; fusing the first feature map and the second feature map through the feature aggregation module to obtain a first fused feature map; performing third processing on the low-level feature map through the feature aggregation module to obtain a third feature map with the same resolution as that of the high-level feature map, and performing fourth processing on the high-level feature map through the feature aggregation module to obtain a fourth feature map with the same resolution as that of the high-level feature map; fusing the third feature map and the fourth feature map through the feature aggregation module to obtain a second fused feature map; and sampling the third characteristic image and the fourth characteristic image to the same resolution ratio through the characteristic aggregation module, and then fusing to obtain a target fusion characteristic image corresponding to the image to be processed. According to the scheme, the low-level features and the high-level features are extracted respectively, and feature fusion is performed on the low-level features and the high-level features, so that the feature extraction effect can be improved, and the image segmentation accuracy can be improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a first feature extraction method provided in an embodiment of the present application.

Fig. 2 is a schematic structural diagram of a portrait segmentation model of the feature extraction method provided in the embodiment of the present application.

Fig. 3 is a second flowchart of the feature extraction method according to the embodiment of the present application.

Fig. 4 is a schematic structural diagram of a first feature encoding module according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a second feature encoding module according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a seventh feature encoding unit of a second feature encoding module according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an eighth feature encoding unit of the second feature encoding module according to an embodiment of the present application.

Fig. 8 is a schematic structural diagram of a ninth feature encoding unit of the second feature encoding module according to an embodiment of the present application.

Fig. 9 is a schematic structural diagram of a feature aggregation module according to an embodiment of the present application.

Fig. 10 is a schematic diagram of a basic composition structure provided in an embodiment of the present application.

Fig. 11 is a schematic structural diagram of a feature decoding module according to an embodiment of the present application.

Fig. 12 is a schematic structural diagram of a feature extraction device according to an embodiment of the present application.

Fig. 13 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.

Fig. 14 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

An execution main body of the feature extraction method may be the feature extraction device provided in the embodiment of the present application, or an electronic device integrated with the feature extraction device, where the feature extraction device may be implemented in a hardware or software manner. The electronic device may be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer.

Referring to fig. 1, fig. 1 is a first flowchart of a feature extraction method according to an embodiment of the present disclosure. The specific process of the feature extraction method provided by the embodiment of the application can be as follows:

101. and extracting a low-level feature map of the image to be processed through the first feature coding module, and extracting a high-level feature map of the image to be processed through the second feature coding module.

The image to be processed may be an image containing any content, for example, the image to be processed may be a video frame in a video image.

Wherein, the low-level features refer to, for example, contours, edges, colors, textures, shapes and the like in the image, and indicate more specific features; for example, extracting low-level features from a face can extract the outline, nose, eyes, and the like of the face, and then extracting high-level features to extract whether the face is a face or the expression features of the face, and the like, which indicate more abstract features.

In an embodiment, the present disclosure may be applied to image segmentation, please refer to fig. 2, and fig. 2 is a schematic structural diagram of an image segmentation model, where the image segmentation model includes a first feature encoding module and a second feature encoding module. For example, a video frame may be input into a first feature encoding module, a low-level feature in the video frame is extracted by the first encoding module, a low-level feature map is obtained according to the low-level feature, the video frame is input into a second feature encoding module, a high-level feature in the video frame is extracted by the second feature encoding module, and a high-level feature map is obtained according to the high-level feature.

102. And performing second processing on the high-level feature map by the feature aggregation module to obtain a second feature map with the same resolution as the low-level feature map.

For example, a video frame may be input to the feature aggregation module, a first feature map with the same resolution as that of the high-level feature map is obtained by performing first processing on the low-level feature map of the video frame by the feature aggregation module, and a second feature map with the same resolution as that of the low-level feature map is obtained by performing second processing on the high-level feature map of the video frame by the feature aggregation module.

103. And fusing the first feature map and the second feature map through the feature aggregation module to obtain a first fused feature map.

For example, the first feature map and the second feature map of the video frame may be subjected to fusion processing to obtain a first fused feature map.

104. And performing fourth processing on the high-level feature map by using the feature aggregation module to obtain a fourth feature map with the same resolution as that of the low-level feature map.

For example, the feature aggregation module continues to perform third processing on the low-level feature map of the video frame to obtain a third feature map with the same resolution as that of the high-level feature map, and the feature aggregation module performs fourth processing on the high-level feature map of the video frame to obtain a fourth feature map with the same resolution as that of the low-level feature map.

105. And fusing the third feature map and the fourth feature map through a feature aggregation module to obtain a second fused feature map.

For example, the feature aggregation module performs fusion processing on the third feature map and the fourth feature map of the video frame to obtain a second fused feature map.

106. And sampling the first fusion feature map and the second fusion feature map to the same resolution through a feature aggregation module, and then fusing to obtain a target fusion feature map corresponding to the image to be processed.

For example, the first fusion feature map and the second fusion feature map of the video frame are sampled to have the same resolution as the low-level feature map and then are fused to obtain the target fusion feature map of the corresponding video frame.

In an embodiment, referring to fig. 2, the portrait segmentation model may further include a feature decoding module, which may input the target fusion feature map of the video frame into the feature decoding module, and decode the target fusion feature map of the video frame through the feature decoding module to obtain a portrait segmentation mask map corresponding to the video frame.

In an embodiment, a video frame in a video image may be acquired at intervals of a certain number of frames according to the processing capability of the current electronic device, the acquired video frame sequence is input into a portrait segmentation model to obtain corresponding portrait segmentation mask image sequences, and the corresponding portraits in the original video frame sequence are segmented according to the portrait segmentation mask image sequences to obtain a portrait segmentation result of the video image.

In particular implementation, the present application is not limited by the execution sequence of the described steps, and some steps may be performed in other sequences or simultaneously without conflict.

As can be seen from the above, in the feature extraction method provided in the embodiment of the present application, the first feature coding module and the second feature coding module respectively extract the low-level feature map and the high-level feature map of the image to be processed, the feature aggregation module respectively samples the low-level feature map and the high-level feature map to the resolution of the other side for fusion processing to obtain the first fusion feature map and the second fusion feature map, and then samples the first fusion feature map and the second fusion feature map to the same resolution for fusion processing to obtain the target fusion feature map of the image to be processed. According to the scheme, the low-level features and the high-level features are extracted respectively, and feature fusion is performed on the low-level features and the high-level features, so that the feature extraction effect can be improved, and the image segmentation accuracy can be improved.

The method according to the preceding embodiment is illustrated in further detail below by way of example.

Referring to fig. 3, fig. 3 is a second flow chart of the feature extraction method according to the embodiment of the present application. The method comprises the following steps:

201. and extracting a low-level feature map of the image to be processed through the first feature coding module, and extracting a high-level feature map of the image to be processed through the second feature coding module.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a first feature encoding module according to an embodiment of the present disclosure. The first feature encoding module comprises a first feature encoding unit, a second feature encoding unit, a third feature encoding unit, a fourth feature encoding unit, a fifth feature encoding unit and a sixth feature encoding unit.

The first feature coding unit, the third feature coding unit and the fifth feature coding unit comprise a standard convolution layer, a batch regularization layer and an activation function layer which are connected in sequence; the second feature coding unit, the fourth feature coding unit and the sixth feature coding unit comprise a standard convolution layer and a batch regularization layer which are connected in sequence.

In an embodiment, the step of "extracting a low-level feature map of an image to be processed by a first feature encoding module" may include the steps of:

(1) coding an image to be processed by a first feature coding unit which is input as a first image channel and output as a second image channel to obtain first feature coded data;

for example, the first image channel may be 3, and the second image channel may be 64, that is, the first feature encoding data is obtained by encoding the image to be processed by the first feature encoding unit which inputs 3 channels and outputs 64 channels.

(2) Coding the first feature coded data through a second feature coding unit which is input as a second image channel and output as the second image channel to obtain second feature coded data;

for example, the first feature coded data is encoded by the second feature coding unit with the second image channel of 64 in step (1), that is, 64 channels are input and 64 channels are output, so as to obtain second feature coded data.

(3) Coding the second feature coded data through a third feature coding unit which is input as a second image channel and output as the second image channel to obtain third feature coded data;

the second image channel is 64, above.

(4) Coding the third feature coded data through N fourth feature coding units which are input into the second image channel and output into the second image channel to obtain fourth feature coded data, wherein N is a positive integer and is not less than 2;

the second image channel is 64, above.

(5) Encoding the fourth feature encoded data by a fifth feature encoding unit which is input as a second image channel and output as a third image channel to obtain fifth feature encoded data;

for example, the third image channel may be 128, that is, the fourth feature encoded data is encoded by the fifth feature encoding unit which inputs 64 channels and outputs 128 channels, resulting in fifth feature encoded data. (6) And encoding the fifth feature encoded data by M sixth feature encoding units which are input into the third image channel and output into the third image channel to obtain a low-level feature map, wherein M is a positive integer and is not less than 3.

Similarly, the fifth feature coded data is coded by M sixth feature coding units with 128 channels of input and 128 channels of output, and a low-level feature map is obtained.

That is, the first feature encoding module is an RGB video frame with 3 channels as input and a feature map with 128 channels as output.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a second feature encoding module according to an embodiment of the present disclosure. The second feature coding module comprises a seventh feature coding unit, an eighth feature coding unit, a ninth feature coding unit, a tenth feature coding unit, an eleventh feature coding unit, a twelfth feature coding unit and a thirteenth feature coding unit.

In an embodiment, the step "extracting a high-level feature map of an image to be processed by a second feature encoding module" may include the following steps:

(1) encoding the image to be processed by a seventh feature encoding unit which is input as a fourth image channel and output as a fifth image channel to obtain sixth feature encoded data;

for example, the fourth image channel may be 3 and the fifth image channel may be 16.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a seventh feature encoding unit according to an embodiment of the present application. The seventh feature coding unit comprises a convolution layer, an activation function layer, a first feature coding subunit, a second feature coding subunit, a maximum pooling layer and a splicing layer. The first characteristic coding subunit comprises a standard convolution layer, a batch regularization layer and an activation function layer which are sequentially connected; the second feature coding subunit comprises a standard convolution layer, a batch regularization layer and an activation function layer which are connected in sequence.

In an embodiment, step (1) may further include the steps of:

(11) performing standard convolution processing and activation processing on the image to be processed sequentially through the convolution layer and the activation function layer to obtain first processing data;

(12) carrying out standard convolution processing, batch regularization processing and activation processing on the first processing data in sequence through a first feature coding subunit to obtain second processing data;

(13) carrying out convolution processing, batch regularization processing and activation processing on the second processed data in sequence through a second feature coding subunit to obtain third processed data;

(14) performing pooling processing on the first processing data through the maximum pooling layer to obtain fourth processing data;

(15) and splicing the third processed data and the fourth processed data through the splicing layer to obtain sixth coded data.

(2) Encoding the sixth feature encoded data by an eighth feature encoding unit which is input as a fifth image channel and output as a sixth image channel to obtain seventh feature encoded data;

referring to fig. 7, fig. 7 is a schematic structural diagram of an eighth feature encoding unit according to an embodiment of the present application. The eighth feature coding unit comprises a third feature coding subunit, a fourth feature coding subunit, a fifth feature coding subunit, a sixth feature coding subunit, a seventh feature coding subunit, an eighth feature coding subunit, a feature superposition layer (add layer) and an activation function layer. The third feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the fourth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the fifth characteristic coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the sixth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the seventh feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the eighth feature coding subunit comprises a depth convolution layer, a batch regularization layer, a point convolution layer and a batch regularization layer which are connected in sequence.

For example, as above, the fifth image channel is 16 and the sixth image channel may be 32.

In an embodiment, step (2) may further include the following steps:

(21) sequentially carrying out convolution processing and batch regularization processing on the sixth characteristic coded data through a third characteristic coding subunit, a fourth characteristic coding subunit, a fifth characteristic coding subunit, a sixth characteristic coding subunit and a seventh characteristic coding subunit to obtain fifth processed data;

(22) carrying out deep convolution processing, batch regularization processing, point convolution processing and batch regularization processing on the fifth processed data in sequence through the eighth feature coding subunit to obtain sixth processed data;

(23) adding the fifth processed data and the sixth processed data through the feature superposition layer to obtain seventh feature processed data;

(24) and activating the seventh feature processing data through the activation function layer to obtain seventh feature coded data.

(3) Coding the seventh feature coded data by J ninth feature coding units which are input into a sixth image channel and output into the sixth image channel to obtain eighth feature coded data, wherein J is a positive integer and is not less than 2;

referring to fig. 8, fig. 8 is a schematic structural diagram of a ninth feature encoding unit according to an embodiment of the present application. The ninth feature coding unit comprises a ninth feature coding subunit, a tenth feature coding subunit, an eleventh feature coding subunit, a twelfth feature coding subunit, a feature superposition layer and an activation function layer. The ninth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the tenth characteristic coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the eleventh feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the twelfth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are connected in sequence.

Similarly, the sixth image channel is 32, that is, the seventh feature coded data is coded by J ninth feature coding units, which are input as 32 channels and output as 32 channels, to obtain eighth feature coded data.

In an embodiment, step (3) may further include the steps of:

(31) the seventh feature coded data is sequentially coded through the ninth feature coding subunit, the tenth feature coding subunit, the eleventh feature coding subunit and the twelfth coding subunit to obtain eighth feature processed data;

(32) adding the seventh feature coded data and the eighth feature processed data through the feature superposition layer to obtain ninth processed data;

(33) and activating the ninth processed data through the activation function layer to obtain eighth feature coded data.

(4) Encoding the eighth feature encoded data by a tenth feature encoding unit which is input as a sixth image channel and output as a seventh image channel to obtain ninth feature encoded data;

for example, the seventh image channel may be 64, that is, the eighth feature encoded data is subjected to encoding processing by the tenth feature encoding unit which is input as 32 channels and output as 64 channels, resulting in ninth feature encoded data.

The tenth feature encoding unit has the same structure as the eighth feature encoding unit, and the eighth feature encoding unit may be referred to in the structural schematic diagram of the tenth feature encoding unit.

(5) Encoding the ninth feature encoded data by K eleventh feature encoding units which are input into a seventh image channel and output into the seventh image channel to obtain tenth feature encoded data, wherein K is a positive integer and is not less than 2;

similarly, the seventh image channel is 64, that is, the ninth feature coded data is coded by K eleventh feature coding units which are input into 64 channels and output into 64 channels, so as to obtain tenth feature coded data.

The structure of the eleventh feature encoding unit is the same as that of the ninth feature encoding unit, and the ninth feature encoding unit may be referred to in the structural schematic diagram of the eleventh feature encoding unit.

(6) Encoding the tenth feature encoded data by a twelfth feature encoding unit which is input as a seventh image channel and output as an eighth image channel to obtain eleventh feature encoded data;

for example, the eighth image channel may be 128, that is, the tenth feature encoded data is subjected to encoding processing by the twelfth feature encoding unit which is input as 64 channels and output as 128 channels, to obtain eleventh feature encoded data.

The twelfth feature encoding unit has the same structure as the eighth feature encoding unit, and the eighth feature encoding unit may be referred to in the schematic structural diagram of the twelfth feature encoding unit.

(7) And carrying out coding processing on the eleventh coded data through L thirteenth characteristic coding units which are input into the eighth image channel and output into the eighth image channel to obtain a high-level characteristic diagram, wherein L is a positive integer and is not less than 3.

For example, as above, the eighth image channel is 128, so the second feature encoding module is an RGB video frame with 3 channels as input and a feature map with 128 channels as output.

The number of image channels obtained by the first feature encoding module and the second feature encoding module is the same, and if the resolution of the low-level feature map output by the first feature encoding module is H × W, the resolution of the high-level feature map output by the second feature encoding module is 1/4H × 1/4 × W. Among them, H (height), W (width). For example, the resolution of the low-level feature map is 256 × 256, and the resolution of the high-level feature map is 64 × 64.

The structure of the thirteenth characteristic encoding unit is the same as that of the ninth characteristic encoding unit, and the ninth characteristic encoding unit may be referred to for the structural diagram of the thirteenth characteristic encoding unit.

According to the scheme, high-low level and multi-scale feature extraction and fusion processes are designed, features can be sufficiently aggregated, and the video portrait segmentation effect is promoted to be improved.

202. And sequentially carrying out deep convolution processing, batch regularization processing, point convolution processing, batch regularization processing and point convolution processing on the low-level feature map through a first feature aggregation unit of the feature aggregation module and the point convolution layer to obtain a first feature map.

The first feature aggregation unit comprises a depth convolution layer, a batch regularization layer, a point convolution layer and a batch regularization layer which are connected in sequence.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a feature aggregation module according to an embodiment of the present disclosure. The feature aggregation module in the embodiment of the present application includes four processing branches, which are a first processing branch, a second processing branch, a third processing branch, and a fourth processing branch from left to right as shown in the figure. In this step, the corresponding processing is performed in the first processing branch.

The resolution of the first feature map in this step is H × W.

203. And sequentially carrying out standard convolution processing, batch regularization processing, bilinear interpolation processing and activation processing on the high-level feature map through a second feature aggregation unit, a bilinear interpolation layer and an activation function layer of the feature aggregation module to obtain a second feature map.

The second characteristic aggregation unit comprises a standard convolution layer and a batch regularization layer which are connected in sequence.

Since the resolution of the low-level feature map is different from that of the high-level feature map, the high-level feature map is sampled by a Bilinear Interpolation layer (Bilinear Interpolation), so that the resolution of the second feature map is also H × W, and the first feature map and the second feature map are processed in a subsequent process.

With reference to fig. 9, the third processing branch is processed accordingly in this step.

204. And multiplying the first feature map and the second feature map by a feature aggregation module to obtain a first fused feature map.

For example, the first feature map and the second feature map with the resolution H × W obtained by the first processing branch and the third processing branch are multiplied (Element-wise Multiplication) to obtain a first fused feature map with the resolution H × W.

205. And sequentially carrying out standard convolution processing, batch regularization processing and average pooling processing on the low-level feature map through a third feature aggregation unit and an average pooling layer of the feature aggregation module to obtain a third feature map.

The third feature aggregation unit comprises a standard convolution layer and a batch regularization layer.

With reference to fig. 9, the corresponding processing is performed in the second processing branch.

The resolution of the third feature map in this step is 1/4H 1/4W.

206. And sequentially carrying out standard convolution processing, batch regularization processing, point convolution processing and activation processing on the high-level feature map through a fourth feature aggregation unit, a point convolution layer and an activation function layer of the feature aggregation module to obtain a fourth feature map.

The fourth feature aggregation unit comprises a standard convolution layer and a batch regularization layer.

With reference to fig. 9, in this step, the corresponding processing is performed in the fourth processing branch.

The resolution of the fourth feature map in this step is 1/4H 1/4W.

207. And multiplying the third feature map and the fourth feature map by a feature aggregation module to obtain a second fused feature map.

For example, the third feature map and the fourth feature map with a resolution of 1/4H 1/4W, which are obtained in the second processing branch and the fourth processing branch, are multiplied together to obtain a second fused feature map with a resolution of 1/4H 1/4W.

208. And sampling the first fusion characteristic diagram and the second fusion characteristic diagram to the same resolution through a characteristic aggregation module, and adding to obtain a target fusion characteristic diagram.

Referring to fig. 9, since the resolutions of the first fused feature map and the second fused feature map are different, the second fused feature map may be interpolated by a Bilinear Interpolation layer (Bilinear Interpolation) to obtain a feature map with H × W resolution, and then the two feature maps are added to obtain the target fused feature map.

Referring to fig. 10, fig. 10 is a schematic view of a basic composition structure according to an embodiment of the present disclosure. In which, the embodiment provided in the present application includes 7 basic constituent structures, which are a first basic constituent structure 10, a second basic constituent structure 20, a third basic constituent structure 30, a fourth basic constituent structure 40, a fifth basic constituent structure 50, a sixth basic constituent structure 60, and a seventh basic constituent structure 70, please refer to fig. 10, and fig. 10 is a schematic diagram of the basic constituent structures provided in the embodiment of the present application.

The second infrastructure 20 is composed of a convolutional layer (Conv), a Batch regularization layer (Batch Normalization), and a ReLU activation function layer.

The first base composition structure 10 sets the convolution step size to 2 on the basis of the second base composition structure 20.

The third infrastructure 30 is composed of depth convolution and batch regularization layers. Here, the convolution kernel size of the deep convolution is 3 × 3.

The fourth infrastructure 40 consists of point convolution (poitwisetconv) and batch regularization layers. Wherein the dot convolution is a convolution with a convolution kernel size of 1x1

The fifth infrastructure 50 is a base architecture that removes the ReLU activation function layer from the second infrastructure 20.

The sixth basic composition structure 60 is composed of a deep convolution (DepthwiseConv), a batch regularization layer, and a point convolution.

The seventh infrastructure component 70 is to remove the ReLU activation function layer from the first infrastructure component 10.

In an embodiment, the first feature encoding unit, the third feature encoding unit, the fifth feature encoding unit and the second feature encoding subunit are all the first basic component structure 10.

In an embodiment, the second feature encoding unit, the fourth feature encoding unit, the sixth feature encoding unit, the first feature encoding sub-unit, the third feature encoding sub-unit, and the ninth feature encoding sub-unit are all the second basic component structure 20.

In one embodiment, the fourth feature-coded sub-cell, the sixth feature-coded sub-cell, and the tenth feature-coded sub-cell are all the third infrastructure 30.

In one embodiment, the fifth feature coding sub-unit and the eleventh feature coding sub-unit are both the fourth infrastructure 40.

In an embodiment, the seventh feature encoding subunit, the twelfth feature encoding subunit, the second feature aggregating unit and the fourth feature aggregating unit are all the fifth basic composition structure 50.

In one embodiment, the eighth feature encoding subunit and the first feature aggregating unit are both the sixth basic composition structure 60.

In one embodiment, the third characteristic polymeric unit is a seventh basic constituent structure 70.

In addition, in an implementation manner, the portrait segmentation model provided in the embodiment of the present application may further include a feature decoding module. Referring to fig. 11, fig. 11 is a schematic structural diagram of a feature decoding module according to an embodiment of the present disclosure.

The feature decoding module may include a first feature decoding unit, a regularization layer (Dropout layer), a second feature decoding unit, and a Bilinear Interpolation layer (Bilinear Interpolation).

In one embodiment, the self-image segmentation model may employ a Yotube-VOS dataset, which is a video object segmentation dataset. Based on the PyTorch framework, 8 NVIDIA Tesla V100 GPUs were used. The loss function may employ Cross Entropy loss (Cross Entropy loss). The evaluation function may employ a cross-over ratio function (IoU).

Referring to fig. 2, the specific steps of the portrait segmentation scheme may be as follows:

s11: firstly, training a portrait segmentation model. The data set is divided into a test set and a training set according to the ratio of 2: 8. And performing data enhancement processing including random rotation, random left-right turning, random clipping, Gamma (Gamma) transformation and the like on the training set.

And S12, in a training period, traversing all video frame images of the training set, and preprocessing the original image, including random cutting and normalization processing.

And S13, sending the preprocessed original image into a first feature coding module and a second feature coding module to obtain a low-level feature image and a high-level feature image. The resolution of the low-level feature map is H × W, the resolution of the high-level feature map is 1/4H × 1/4W, and the number of channels is 128. And sending the two feature graphs into a multi-scale feature aggregation module for feature fusion to obtain a fused feature graph, and sending the fused feature graph into a feature decoder, wherein the output channel number is 2, and the resolution is the portrait mask graph of the original graph resolution. The resolution of the original image is 4H 4W. For example, the portrait mask image corresponding to the output video is the portrait mask image t-1, the portrait mask image t and the portrait mask image t +1, and similarly, the portrait label image corresponds to the portrait mask image one to one.

S14: and calculating the cross entropy loss between the portrait mask graph and the annotation graph. And (4) performing a back propagation algorithm on the whole network and updating parameters.

S15, repeating S12-S14 in a plurality of training periods until the loss function is fully converged. And saving the network model structure and parameters.

And S16, deploying the trained network model and loading the network.

And S17, acquiring real-time video frames from the video frame stream, and acquiring one video frame every n frames according to the processing capability of the deployment machine, wherein n is a natural number, for example, n can be 1, and the acquired video frames are a video frame t-1, a video frame t and a video frame t + 1. And preprocessing the obtained video frame. And sending the preprocessed video frame into a trained convolutional neural network to obtain a portrait mask image corresponding to the output video, and segmenting the portrait in the original image by using the portrait mask image to obtain a portrait segmentation result of the current video frame t.

And S18, repeating the S17 to obtain a portrait segmentation mask result of the whole video.

The cross entropy loss is specifically formulated as follows:

for the corresponding pixel point i in the portrait mask image and the annotation image, yi represents the value of i in the annotation image, and pi represents the predicted value in the portrait mask image. The total number of pixels in a single sample is N. The log loss for all samples represents the average of the log loss for each sample. Ideally, the log loss is 0.

The specific formula of the merit function for a single sample is as follows:

where X represents the output mask and Y represents the callout. All samples were evaluated at IoU.

As can be seen from the above, in the feature extraction method provided in the embodiment of the present application, the first feature coding module and the second feature coding module respectively extract the low-level feature map and the high-level feature map of the image to be processed, the feature aggregation module samples the low-level feature map and the high-level feature map to the resolution of the other side for fusion processing to obtain the first fusion feature map and the second fusion feature map, and then samples the first fusion feature map and the second fusion feature map to the same resolution for fusion processing to obtain the target fusion feature map of the image to be processed. According to the scheme, the low-level features and the high-level features are extracted respectively, and feature fusion is performed on the low-level features and the high-level features, so that the feature extraction effect can be improved, and the image segmentation accuracy can be improved.

In addition, the scheme adopts a lightweight design, fully utilizes the advantage of deep convolution, realizes smaller calculation amount, and can carry out portrait segmentation processing on video frames in real time, thereby being capable of being deployed on mobile terminals such as mobile phones. For example, the scheme can be applied to a mobile phone, can provide an accurate portrait mask for various image processing algorithms including real-time video portrait beautifying, background replacing and the like, and can enable the applications such as video portrait background blurring and the like to be more accurate.

In one embodiment, a feature extraction apparatus is also provided. Referring to fig. 12, fig. 12 is a schematic structural diagram of a feature extraction device according to an embodiment of the present disclosure. Wherein the feature extraction apparatus 300 is applied to an electronic device, the feature extraction apparatus 300 includes: the feature extraction module 301, the first processing module 302, the first feature fusion module 303, the second processing module 304, the second feature fusion module 305, and the third feature fusion module 306 are as follows:

a feature extraction module 301, configured to extract a low-level feature map of an image to be processed through a first feature encoding module, and extract a high-level feature map of the image to be processed through a second feature encoding module;

a first processing module 302, configured to perform a first processing on the low-level feature map through a feature aggregation module to obtain a first feature map with a resolution that is the same as that of the low-level feature map, and perform a second processing on the high-level feature map through the feature aggregation module to obtain a second feature map with a resolution that is the same as that of the low-level feature map;

a first feature fusion module 303, configured to fuse the first feature map and the second feature map by the feature aggregation module to obtain a first fused feature map;

a second processing module 304, configured to perform third processing on the low-level feature map through the feature aggregation module to obtain a third feature map with the same resolution as that of the high-level feature map, and perform fourth processing on the high-level feature map through the feature aggregation module to obtain a fourth feature map with the same resolution as that of the high-level feature map;

a second feature fusion module 305, configured to fuse the third feature map and the fourth feature map by the feature aggregation module to obtain a second fused feature map;

and a third feature fusion module 306, configured to fuse the first fusion feature map and the second fusion feature map after sampling the first fusion feature map and the second fusion feature map to a same resolution through the feature aggregation module, so as to obtain a target fusion feature map corresponding to the image to be processed.

It should be noted that the feature extraction device provided in the embodiment of the present application and the feature extraction method in the foregoing embodiment belong to the same concept, and any method provided in the embodiment of the feature extraction method can be implemented by the feature extraction device, and the specific implementation process thereof is described in detail in the embodiment of the feature extraction method, and is not described herein again.

As can be seen from the above, the feature extraction device provided in the embodiment of the present application is configured to, through the feature extraction module 301, extract a low-level feature map of an image to be processed through the first feature encoding module, and extract a high-level feature map of the image to be processed through the second feature encoding module; the first processing module 302 is configured to perform first processing on the low-level feature map through a feature aggregation module to obtain a first feature map with the same resolution as the low-level feature map, and perform second processing on the high-level feature map through the feature aggregation module to obtain a second feature map with the same resolution as the low-level feature map; the first feature fusion module 303 is configured to fuse the first feature map and the second feature map to obtain a first fused feature map; the second processing module 304 is configured to perform third processing on the low-level feature map through the feature aggregation module to obtain a third feature map with the same resolution as that of the high-level feature map, and perform fourth processing on the high-level feature map through the feature aggregation module to obtain a fourth feature map with the same resolution as that of the high-level feature map; a second feature fusion module 305 is configured to fuse the third feature map and the fourth feature map to obtain a second fused feature map through the feature aggregation module; the third feature fusion module 306 is configured to fuse the third feature image and the fourth feature image after sampling the third feature image and the fourth feature image to the same resolution through the feature aggregation module, so as to obtain a target fusion feature image corresponding to the image to be processed. According to the method and the device, the low-level features and the high-level features are extracted respectively, and feature fusion is performed on the low-level features and the high-level features, so that the feature extraction effect can be improved, and the portrait segmentation accuracy can be improved.

The embodiment of the application also provides the electronic equipment. The electronic device can be a smart phone, a tablet computer and the like. Referring to fig. 13, fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 400 comprises a processor 401 and a memory 402. The processor 401 is electrically connected to the memory 402.

The processor 401 is a control center of the electronic device 400, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or calling a computer program stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device.

Memory 402 may be used to store computer programs and data. The memory 402 stores computer programs containing instructions executable in the processor. The computer program may constitute various functional modules. The processor 401 executes various functional applications and data processing by calling a computer program stored in the memory 402.

In this embodiment, the processor 401 in the electronic device 400 loads instructions corresponding to one or more processes of the computer program into the memory 402 according to the following steps, and the processor 401 runs the computer program stored in the memory 402, so as to implement various functions:

In some embodiments, please refer to fig. 14, and fig. 14 is a second structural diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 400 further comprises: radio frequency circuit 403, display 404, control circuit 405, input unit 406, audio circuit 407, sensor 408, and power supply 409. The processor 401 is electrically connected to the radio frequency circuit 403, the display 404, the control circuit 405, the input unit 406, the audio circuit 407, the sensor 408, and the power source 409.

The radio frequency circuit 403 is used for transceiving radio frequency signals to communicate with a network device or other electronic devices through wireless communication.

The display screen 404 may be used to display information entered by or provided to the user as well as various graphical user interfaces of the electronic device, which may be comprised of images, text, icons, video, and any combination thereof.

The control circuit 405 is electrically connected to the display screen 404, and is configured to control the display screen 404 to display information.

The input unit 406 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control. The input unit 406 may include a fingerprint recognition module.

The audio circuit 407 may provide an audio interface between the user and the electronic device through a speaker, microphone. Wherein the audio circuit 407 comprises a microphone. The microphone is electrically connected to the processor 401. The microphone is used for receiving voice information input by a user.

The sensor 408 is used to collect external environmental information. The sensors 408 may include one or more of ambient light sensors, acceleration sensors, gyroscopes, etc.

The power supply 409 is used to power the various components of the electronic device 400. In some embodiments, the power source 409 may be logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are implemented through the power management system.

Although not shown in the drawings, the electronic device 400 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.

As can be seen from the above, an embodiment of the present application provides an electronic device, where the electronic device extracts a low-level feature map and a high-level feature map of an image to be processed through a first feature coding module and a second feature coding module, and samples the low-level feature map and the high-level feature map to a resolution of an opposite side through a feature aggregation module to perform fusion processing, so as to obtain a first fusion feature map and a second fusion feature map, and then samples the first fusion feature map and the second fusion feature map to a same resolution to perform fusion processing, so as to obtain a target fusion feature map of the image to be processed. According to the scheme, the low-level features and the high-level features are extracted respectively, and feature fusion is performed on the low-level features and the high-level features, so that the feature extraction effect can be improved, and the image segmentation accuracy can be improved.

An embodiment of the present application further provides a storage medium, where a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer executes the feature extraction method according to any of the above embodiments.

It should be noted that, all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, which may include, but is not limited to: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Furthermore, the terms "first", "second", and "third", etc. in this application are used to distinguish different objects, and are not used to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.

The feature extraction method, the feature extraction device, the storage medium, and the electronic device provided in the embodiments of the present application are described in detail above. The principle and the implementation of the present application are explained herein by applying specific examples, and the above description of the embodiments is only used to help understand the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of feature extraction, comprising:

2. The feature extraction method according to claim 1, wherein the feature aggregation module includes a first feature aggregation unit and a point convolution layer, wherein the first feature aggregation unit includes a depth convolution layer, a batch regularization layer, a point convolution layer, and a batch regularization layer, which are sequentially connected, and the first processing is performed on the low-level feature map by the feature aggregation module to obtain a first feature map having a resolution that is the same as that of the low-level feature map, including:

and sequentially carrying out deep convolution processing, batch regularization processing, point convolution processing, batch regularization processing and point convolution processing on the low-level feature map through the first feature aggregation unit and the point convolution layer to obtain the first feature map.

3. The feature extraction method according to claim 1, wherein the feature aggregation module includes a second feature aggregation unit, a bilinear interpolation layer, and an activation function layer, wherein the second feature aggregation unit includes a standard convolution layer and a batch regularization layer that are sequentially connected, and the second processing is performed on the high-level feature map by the feature aggregation module to obtain a second feature map having a resolution that is the same as that of the low-level feature map, and includes:

and sequentially carrying out standard convolution processing, batch regularization processing, bilinear interpolation processing and activation processing on the high-level feature map through the second feature aggregation unit, the bilinear interpolation layer and the activation function layer to obtain the second feature map.

4. The feature extraction method according to claim 1, wherein the fusing the first feature map and the second feature map by the feature aggregation module to obtain a first fused feature map includes:

and multiplying the first feature map and the second feature map by the feature aggregation module to obtain the first fused feature map.

5. The feature extraction method according to claim 1, wherein the feature aggregation module includes a third feature aggregation unit and an average pooling layer, wherein the third feature aggregation unit includes a standard convolutional layer and a batch regularization layer, and the third processing is performed on the low-level feature map by the feature aggregation module to obtain a third feature map with the same resolution as that of the high-level feature map includes:

and sequentially performing standard convolution processing, batch regularization processing and average pooling processing on the low-level feature map through the third feature aggregation unit and the average pooling layer to obtain the third feature map.

6. The feature extraction method of claim 1, wherein the feature aggregation module includes a fourth feature aggregation unit, a point convolution layer, and an activation function layer, wherein the fourth feature aggregation unit includes a standard convolution layer and a batch regularization layer, and the fourth processing of the high-level feature map by the feature aggregation module to obtain a fourth feature map with the same resolution as that of the high-level feature map includes:

and sequentially performing standard convolution processing, batch regularization processing, point convolution processing and activation processing on the high-level feature map through the fourth feature aggregation unit, the point convolution layer and the activation function layer to obtain the fourth feature map.

7. The feature extraction method according to claim 1, wherein the fusing the third feature map and the fourth feature map by the feature aggregation module to obtain a second fused feature map includes:

and multiplying the third feature map and the fourth feature map by the feature aggregation module to obtain the second fused feature map.

8. The feature extraction method of claim 1, wherein the obtaining of the target fusion feature map corresponding to the image to be processed by fusing the first fusion feature map and the second fusion feature map after sampling the first fusion feature map and the second fusion feature map to a same resolution by the feature aggregation module comprises:

and sampling the first fused feature map and the second fused feature map to the same resolution as the low-level feature map through the feature aggregation module, and adding the two to obtain the target fused feature map.

9. The feature extraction method according to claim 1, wherein the first feature coding module includes a first feature coding unit, a second feature coding unit, a third feature coding unit, a fourth feature coding unit, a fifth feature coding unit, and a sixth feature coding unit, and the extracting the low-level feature map of the image to be processed by the first feature coding module includes:

coding the image to be processed through a first feature coding unit which is input as a first image channel and output as a second image channel to obtain first feature coded data;

coding the first feature coded data through a second feature coding unit which is input as a second image channel and output as the second image channel to obtain second feature coded data;

coding the second feature coded data through a third feature coding unit which is input as a second image channel and output as the second image channel to obtain third feature coded data;

encoding the third feature encoded data by N fourth feature encoding units which are input into a second image channel and output into the second image channel to obtain fourth feature encoded data, wherein N is a positive integer and is not less than 2;

encoding the fourth feature encoding data through a fifth feature encoding unit which is input as a second image channel and output as a third image channel to obtain fifth feature encoding data;

and encoding the fifth feature encoded data by using M sixth feature encoding units which are input into a third image channel and output into the third image channel to obtain the low-level feature map, wherein M is a positive integer and is not less than 3.

10. The feature extraction method according to claim 9, wherein the first feature encoding unit, the third feature encoding unit, and the fifth feature encoding unit include a standard convolution layer, a batch regularization layer, and an activation function layer, which are connected in sequence; the second feature encoding unit, the fourth feature encoding unit and the sixth feature encoding unit comprise a standard convolution layer and a batch regularization layer which are connected in sequence.

11. The feature extraction method according to claim 1, wherein the second feature encoding module includes a seventh feature encoding unit, an eighth feature encoding unit, a ninth feature encoding unit, a tenth feature encoding unit, an eleventh feature encoding unit, a twelfth feature encoding unit, and a thirteenth feature encoding unit, and the extracting the high-level feature map of the image to be processed by the second feature encoding module includes:

encoding the image to be processed through a seventh feature encoding unit which is input as a fourth image channel and output as a fifth image channel to obtain sixth feature encoded data;

encoding the sixth feature encoded data by an eighth feature encoding unit which is input as a fifth image channel and output as a sixth image channel to obtain seventh feature encoded data;

coding the seventh feature coded data by J ninth feature coding units which are input into a sixth image channel and output into the sixth image channel to obtain eighth feature coded data, wherein J is a positive integer and is not less than 2;

encoding the eighth feature encoded data by a tenth feature encoding unit which is input as a sixth image channel and output as a seventh image channel to obtain ninth feature encoded data;

encoding the ninth feature encoded data by K eleventh feature encoding units which are input into a seventh image channel and output into the seventh image channel to obtain tenth feature encoded data, wherein K is a positive integer and is not less than 2;

encoding the tenth feature encoded data by a twelfth feature encoding unit which is input as a seventh image channel and output as an eighth image channel to obtain eleventh feature encoded data;

and performing encoding processing on the eleventh encoded data by L thirteenth feature encoding units which are input as an eighth image channel and output as an eighth image channel to obtain the high-level feature map, wherein L is a positive integer and is not less than 3.

12. The feature extraction method of claim 11, wherein the seventh feature encoding unit includes a convolution layer, an activation function layer, a first feature encoding subunit, a second feature encoding subunit, a maximum pooling layer, and a stitching layer, and the encoding the image to be processed by the seventh feature encoding unit that is input as a fourth image channel and output as a fifth image channel to obtain sixth feature encoded data includes:

sequentially performing standard convolution processing and activation processing on the image to be processed through the convolution layer and the activation function layer to obtain first processing data;

sequentially performing standard convolution processing, batch regularization processing and activation processing on the first processing data through the first feature coding subunit to obtain second processing data;

sequentially performing convolution processing, batch regularization processing and activation processing on the second processed data through the second feature coding subunit to obtain third processed data;

performing pooling processing on the first processing data through the maximum pooling layer to obtain fourth processing data;

and splicing the third processed data and the fourth processed data through the splicing layer to obtain sixth coded data.

13. The feature extraction method of claim 12, wherein the first feature encoding subunit includes a standard convolution layer, a batch regularization layer, and an activation function layer, which are connected in sequence; the second feature coding subunit comprises a standard convolution layer, a batch regularization layer and an activation function layer which are sequentially connected.

14. The feature extraction method of claim 11, wherein the eighth feature encoding unit includes a third feature encoding subunit, a fourth feature encoding subunit, a fifth feature encoding subunit, a sixth feature encoding subunit, a seventh feature encoding subunit, an eighth feature encoding subunit, a feature superposition layer, and an activation function layer, and the seventh feature encoding unit that is input as a fifth image channel and output as a sixth image channel performs encoding processing on the sixth feature encoded data to obtain seventh feature encoded data includes:

sequentially performing convolution processing and batch regularization processing on the sixth feature coded data through the third feature coding subunit, the fourth feature coding subunit, the fifth feature coding subunit, the sixth feature coding subunit and the seventh feature coding subunit to obtain fifth processed data;

sequentially performing deep convolution processing, batch regularization processing, point convolution processing and batch regularization processing on the fifth processed data through the eighth feature coding subunit to obtain sixth processed data;

adding the fifth processed data and the sixth processed data through a feature superposition layer to obtain seventh feature processed data;

and performing activation processing on the seventh feature processing data through an activation function layer to obtain seventh feature coded data.

15. The feature extraction method of claim 14, wherein the third feature encoding subunit includes a standard convolution layer and a batch regularization layer connected in sequence; the fourth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the fifth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the sixth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the seventh feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the eighth feature coding subunit comprises a depth convolution layer, a batch regularization layer, a point convolution layer and a batch regularization layer which are connected in sequence.

16. The feature extraction method according to claim 11, wherein the ninth feature encoding unit includes a ninth feature encoding subunit, a tenth feature encoding subunit, an eleventh feature encoding subunit, a twelfth feature encoding subunit, a feature superposition layer, and an activation function layer, and the encoding processing of the seventh feature encoded data by J ninth feature encoding units that are input into the sixth image channel and output as the sixth image channel obtains eighth feature encoded data, including:

the seventh feature encoding data is sequentially encoded through the ninth feature encoding subunit, the tenth feature encoding subunit, the eleventh feature encoding subunit and the twelfth encoding subunit to obtain eighth feature processing data;

adding the seventh feature coded data and the eighth feature processed data through the feature superposition layer to obtain ninth processed data;

and performing activation processing on the ninth processed data through the activation function layer to obtain the eighth feature coded data.

17. The feature extraction method of claim 16, wherein the ninth feature coding subunit includes a standard convolution layer and a batch regularization layer connected in sequence; the tenth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the eleventh feature coding subunit comprises a standard convolution layer and a batch regularization layer which are sequentially connected; the twelfth feature coding subunit comprises a standard convolution layer and a batch regularization layer which are connected in sequence.

18. The method of feature extraction according to any one of claims 11 to 17, wherein the eighth feature encoding unit, the tenth feature encoding unit, and the twelfth feature encoding unit are identical in composition and structure; the ninth coding unit, the eleventh characteristic coding unit and the thirteenth characteristic coding unit are identical in composition and structure.

19. A feature extraction device characterized by comprising:

20. A computer-readable storage medium on which a computer program is stored, which, when run on a computer, causes the computer to perform the feature extraction method according to any one of claims 1 to 18.

21. An electronic device comprising a processor and a memory, the memory storing a computer program, wherein the processor is configured to execute the feature extraction method according to any one of claims 1 to 18 by calling the computer program.