CN112434713A

CN112434713A - Image feature extraction method and device, electronic equipment and storage medium

Info

Publication number: CN112434713A
Application number: CN202011390511.0A
Authority: CN
Inventors: 沈涛; 罗超; 胡泓; 李巍
Original assignee: Ctrip Computer Technology Shanghai Co Ltd
Current assignee: Ctrip Computer Technology Shanghai Co Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-03-02

Abstract

The invention provides an image feature extraction method, an image feature extraction device, electronic equipment and a storage medium, wherein the method comprises the following steps: obtaining N characteristic images with different sizes of a target image, wherein N is an integer larger than 2; taking N images with different sizes as the input of at least one double-layer bidirectional characteristic pyramid neural network module; and acquiring the output of the at least one double-layer bidirectional characteristic pyramid neural network module as the image characteristic of the target image so as to perform target detection on the target image. The method and the device provided by the invention improve the smoothness of information flow in the BiFPN, so that the features of different layers are better fused, and the expression capability of the features is improved, thereby improving the performance of image processing functions such as target detection, image classification and the like.

Description

Image feature extraction method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computer application, in particular to an image feature extraction method and device, electronic equipment and a storage medium.

Background

The feature pyramid is a basic component in a recognition system for detecting objects of different dimensions. The inherent multi-scale pyramid hierarchy of deep convolutional networks can be exploited to construct a pyramid of features with marginal extra loss. Currently, a top-down architecture with cross-connections is designed for building high-level semantic feature maps at all scales, and this architecture, called Feature Pyramid Network (FPN), shows significant improvements as a common feature extractor in several applications.

The characteristic gold tower Network currently has various deformation structures, such as a Path Aggregation Network (PANet), a simplified PANet, an NAS-FPN, a bisfppn, and the like. BiFPN is a neural network module which fuses different level features in EfficientDet (target detection algorithm series). BiFPN has consulted the PANET structural feature, has increased from supreme characteristic channel down, has shortened the transmission course of information to the effect of characteristic fusion has been changed. Meanwhile, BiFPN uses the stacking of a plurality of feature pyramids, and the effect of feature fusion is further improved.

However, the current plane structure formed by stacking the BiFPN is from top to bottom, then from bottom to top, and then repeats, because there is a precedence relationship between top to bottom and bottom to top, therefore, the plane structure may cause unsmooth information transmission.

Therefore, how to improve the smoothness of information flow in the BiFPN to better fuse the features of different layers and improve the expression capacity of the features so as to improve the target detection performance is a technical problem to be solved by the technical personnel in the field.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides an image feature extraction method, an image feature extraction device, electronic equipment and a storage medium, so as to improve the smoothness of information flow in BiFPN, better fuse features of different layers, and improve the expression capability of the features, thereby improving the target detection performance.

According to an aspect of the present invention, there is provided an image feature extraction method including:

obtaining N characteristic images with different sizes of a target image, wherein N is an integer larger than 2;

taking N images with different sizes as the input of at least one double-layer bidirectional characteristic pyramid neural network module;

and acquiring the output of the at least one double-layer bidirectional characteristic pyramid neural network module as the image characteristic of the target image so as to perform target detection on the target image.

In some embodiments of the invention, the two-layer bidirectional feature pyramid neural network module comprises:

n input nodes;

the first intermediate layer comprises N-2 groups of nodes, wherein one node in each group of nodes of the first intermediate layer forms a channel from top to bottom, and the other node in each group of nodes of the first intermediate layer forms a channel from bottom to top;

the second intermediate layer comprises N groups of nodes, wherein one node in each group of nodes of the second intermediate layer forms a channel from bottom to top, and the other node in each group of nodes of the second intermediate layer forms a channel from top to bottom;

and each output node is obtained by fusing a group of nodes of the corresponding second intermediate layer.

In some embodiments of the invention, the first intermediate layer comprises:

a node X of the ith group of nodes_i1By a node X of the i-1 th group of nodes_i-11And the i +1 th input node IN_i+1Fusing to obtain another node X of the ith group of nodes_i2By another node X of the (i + 1) th group of nodes_i-12And the i +1 th input node IN_i+1And (b) fusion, wherein i is an integer greater than 1 and less than N-2, and N is an integer greater than 4.

In some embodiments of the invention, the first intermediate layer comprises:

one node X of the first set of nodes₁₁From a first input node IN₁And a second input node IN₂The fusion is obtained, another node X of the first set of nodes₁₂From a second input node IN₂And another node X of the second group of nodes₂₂Obtaining fusion;

one node X of the N-2 th group of nodes_N-21From a node X of the N-3 group of nodes_N-31And an N-1 th input node IN_N-1Another node X of the N-2 group of nodes obtained by fusion_N-22From the N-1 st input node IN_N-1And an Nth input node IN_NAnd (4) obtaining fusion.

In some embodiments of the invention, the second intermediate layer comprises:

a node Y of the ith group of nodes_i1From the ith input node IN_iA node X of the i-1 group of nodes of the first intermediate level_i-11And one node Y of the (i + 1) th group of nodes_i+11Fusing to obtain another node Y of the ith group of nodes_i2From the ith input node IN_iAnother node X of the i-1 group of nodes of the first intermediate level_i-12And another node Y of the i-1 th group of nodes_i-12And (4) obtaining fusion.

In some embodiments of the invention, the second intermediate layer comprises:

one node Y of the first group of nodes₁₁From a first input node IN₁And a node Y of the second group of nodes₂₁The fusion is obtained, another node Y of the first set of nodes₁₂From a first input node IN₁And another node X of the first group of nodes of the first intermediate level₁₂Obtaining fusion;

a node Y of the Nth group of nodes_N1From the Nth input node IN_NAnd a node X of the N-2 group of nodes of the first intermediate layer_N-21Merging to obtain another node Y of the Nth group of nodes_N2From the Nth input node IN_NAnd another node Y of the N-1 th group of nodes_N-12And (4) obtaining fusion.

In some embodiments of the present invention, when N different sized images are input to a plurality of two-level bi-directional feature pyramid neural network modules, the output of a previous two-level bi-directional feature pyramid neural network module is input to a subsequent two-level bi-directional feature pyramid neural network module.

According to still another aspect of the present invention, there is also provided an image feature extraction device including:

the first acquisition module is used for acquiring N characteristic images with different sizes of a target image, wherein N is an integer greater than 2;

the input module is used for taking the N images with different sizes as the input of the at least one double-layer bidirectional characteristic pyramid neural network module;

and the output module is used for acquiring the output of the at least one double-layer bidirectional characteristic pyramid neural network module as the image characteristic of the target image so as to perform target detection on the target image.

According to still another aspect of the present invention, there is also provided an electronic apparatus, including: a processor; a storage medium having stored thereon a computer program which, when executed by the processor, performs the steps of the image feature extraction method as described above.

According to yet another aspect of the present invention, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the image feature extraction method as described above.

Compared with the prior art, the invention has the advantages that:

the method comprises the steps of expanding a characteristic pyramid of a plane in the BiFPN into a three-dimensional characteristic fusion mode, changing a planar characteristic pyramid network into two planes, wherein one plane is from top to bottom and then from bottom to top, the other plane is from bottom to top and then from bottom to top, then fusing the outputs of the two planes, and finally stacking the two planes to form a new pyramid network structure. Therefore, the image feature extraction method and the device can be applied to image processing functions such as target detection and image classification.

Drawings

The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

Fig. 1 shows a flowchart of an image feature extraction method according to an embodiment of the present invention.

Fig. 2 shows a schematic diagram of a BiFPN module with five input nodes.

FIG. 3 shows a schematic diagram of a two-layer bi-directional feature pyramid neural network module, according to an embodiment of the present invention.

Fig. 4 is a block diagram showing an image feature extraction apparatus according to an embodiment of the present invention.

Fig. 5 schematically illustrates a computer-readable storage medium in an exemplary embodiment of the disclosure.

Fig. 6 schematically illustrates an electronic device in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In order to overcome the defects of the prior art, the invention provides an image feature extraction method and device, electronic equipment and a storage medium, which can avoid file errors caused by manual operation while simplifying the production of moving pictures to realize batch generation.

Referring first to fig. 1, fig. 1 shows a schematic diagram of an image feature extraction method according to an embodiment of the present invention. The image feature extraction method comprises the following steps:

step S110: n characteristic images with different sizes of the target image are obtained, wherein N is an integer larger than 2.

Specifically, the target image may be sequentially passed through a convolutional neural network to obtain feature images of different sizes. Further, the number of the feature images sequentially acquired may be equal to or greater than N, and thus, N feature images obtained last may be selected in step S110. In the N-spoke image, the feature image size obtained later is smaller.

Step S120: and taking the N images with different sizes as the input of at least one double-layer bidirectional characteristic pyramid neural network module.

Specifically, the two-layer bidirectional feature pyramid neural network module includes: n input nodes, a first intermediate layer, a second intermediate layer, and N output nodes. The first middle layer comprises N-2 groups of nodes, wherein one node in each group of nodes of the first middle layer forms a channel from top to bottom, and the other node in each group of nodes of the first middle layer forms a channel from bottom to top. The second intermediate layer comprises N groups of nodes, wherein one node in each group of nodes of the second intermediate layer forms a channel from bottom to top, and the other node in each group of nodes of the second intermediate layer forms a channel from top to bottom, and each output node is obtained by fusing a group of nodes of the corresponding second intermediate layer. Hereinafter, a specific structure of the two-layer bidirectional feature pyramid neural network module provided by the present invention will be described with reference to fig. 3, which is not repeated herein.

Specifically, when N images of different sizes are used as inputs to a plurality of two-level bidirectional feature pyramid neural network modules, the output of the previous two-level bidirectional feature pyramid neural network module is used as the input to the next two-level bidirectional feature pyramid neural network module. Therefore, a plurality of double-layer bidirectional characteristic pyramid neural network modules can be transversely cascaded

Step S130: and acquiring the output of the at least one double-layer bidirectional characteristic pyramid neural network module as the image characteristic of the target image so as to perform target detection on the target image.

In the image feature extraction method provided by the invention, the feature pyramid of the plane in the BiFPN is expanded into a three-dimensional feature fusion mode, the expansion mode is to change a planar feature pyramid network into two planes, one plane is from top to bottom and then from bottom to top, the other plane is from bottom to top and then from top to bottom, then the output of the two planes is fused, and finally the two planes are stacked to form a new pyramid network structure, so that the problem of the information flow or fusion efficiency due to the fact that the channels of the feature pyramid in the BiFPN from top to bottom and from bottom to top are sequentially stacked and the planar structure is prevented is solved.

Referring now to fig. 2, fig. 2 shows a schematic diagram of a BiFPN block with five input nodes. Fig. 2 shows five input nodes, three first intermediate nodes, five second intermediate nodes and five output nodes. Wherein the first intermediate node X₁From the input node In₁And an input node In₂And (4) obtaining fusion. First intermediate node X₂From the input node In₃And a first intermediate node X₁And (4) obtaining fusion. First intermediate node X₃From the input node In₄And a first intermediate node X₂And (4) obtaining fusion. Thus, a channel is formed from top to bottom. Second intermediate node Y₅From the input node In₅And a first intermediate node X₃And (4) obtaining fusion. Second intermediate node Y₄From the input node In₄First intermediate node X₃And a second intermediate node Y₅And (4) obtaining fusion. Second intermediate node Y₃From the input node In₃First intermediate node X₂And a second intermediate node Y₄And (4) obtaining fusion. Second intermediate node Y₂From the input node In₂First intermediate node X₁And a second intermediate node Y₃And (4) obtaining fusion. Second intermediate node Y₁From the input node In₁And a second intermediate node Y₂And (4) obtaining fusion. Thereby forming a reaction mixture fromAnd (c) a channel. And finally outputting the signals as five output nodes through five second intermediate nodes. Therefore, the plane structure formed by stacking the BiFPNs is from top to bottom, then from bottom to top, and then is repeated continuously, because the precedence relationship exists between the plane structure and the plane structure from top to bottom and from bottom to top, the plane structure can cause unsmooth information transmission, and the performance of target detection is influenced.

Referring now to fig. 3, fig. 3 illustrates a schematic diagram of a two-layer bi-directional feature pyramid neural network module, according to an embodiment of the present invention.

In the present invention, in the first intermediate layer, one node X of the ith group of nodes_i1By a node X of the i-1 th group of nodes_i-11And the i +1 th input node IN_i+1Fusing to obtain another node X of the ith group of nodes_i2By another node X of the (i + 1) th group of nodes_i-12And the i +1 th input node IN_i+1And (b) fusion, wherein i is an integer greater than 1 and less than N-2, and N is an integer greater than 4. Meanwhile, in the first intermediate layer: one node X of the first set of nodes₁₁From a first input node IN₁And a second input node IN₂The fusion is obtained, another node X of the first set of nodes₁₂From a second input node IN₂And another node X of the second group of nodes₂₂Obtaining fusion; one node X of the N-2 th group of nodes_N-21From a node X of the N-3 group of nodes_N-31And an N-1 th input node IN_N-1Another node X of the N-2 group of nodes obtained by fusion_N-22From the N-1 st input node IN_N-1And an Nth input node IN_NAnd (4) obtaining fusion. In the present invention, in the second intermediate layer, one node Y of the ith group of nodes_i1From the ith input node IN_iA node X of the i-1 group of nodes of the first intermediate level_i-11And one node Y of the (i + 1) th group of nodes_i+11Fusing to obtain another node Y of the ith group of nodes_i2From the ith input node IN_iAnother node X of the i-1 group of nodes of the first intermediate level_i-12And another node Y of the i-1 th group of nodes_i-12And (4) obtaining fusion. At the same time, the second intermediate layerIn (2), one node Y of the first group of nodes₁₁From a first input node IN₁And a node Y of the second group of nodes₂₁The fusion is obtained, another node Y of the first set of nodes₁₂From a first input node IN₁And another node X of the first group of nodes of the first intermediate level₁₂Obtaining fusion; a node Y of the Nth group of nodes_N1From the Nth input node IN_NAnd a node X of the N-2 group of nodes of the first intermediate layer_N-21Merging to obtain another node Y of the Nth group of nodes_N2From the Nth input node IN_NAnd another node Y of the N-1 th group of nodes_N-12And (4) obtaining fusion. The structure of the above-mentioned two-layer bidirectional feature pyramid neural network module is shown in fig. 3, and is not described herein again.

In various embodiments of the present invention, the two-layer bidirectional feature pyramid neural network module may have 3 inputs, 4 inputs, and 5 inputs, which is not limited in the present invention. The downward channel will be up sampled, the upward channel will be down sampled, the parallel input may have no sampling operation, the node inputs will be added first, and then the information fusion can be performed through a band-activated convolutional layer that does not change the height and depth of the feature map.

The foregoing is merely an exemplary description of various implementations of the invention and is not intended to be limiting thereof.

The invention also provides an image feature extraction device, and fig. 4 is a schematic diagram of the image feature extraction device according to the embodiment of the invention. The image feature extraction apparatus 200 includes a first obtaining module 210, an input module 220, and an output module 230.

The first obtaining module 210 is configured to obtain N feature images of different sizes of a target image, where N is an integer greater than 2.

The input module 220 is used for inputting the N images with different sizes as the input of the at least one two-layer bidirectional feature pyramid neural network module.

The output module 230 is configured to obtain an output of the at least one dual-layer bidirectional feature pyramid neural network module as an image feature of the target image, so as to perform target detection on the target image.

In the image feature extraction device provided by the invention, the feature pyramid of the plane in the BiFPN is expanded into a three-dimensional feature fusion mode, the expansion mode is to change a planar feature pyramid network into two planes, one plane is from top to bottom and then from bottom to top, the other plane is from bottom to top and then from top to bottom, then the output of the two planes is fused, and finally the two planes are stacked to form a new pyramid network structure.

Fig. 4 is only a schematic diagram illustrating the image feature extraction apparatus provided by the present invention, and the splitting, combining, and adding of modules are within the scope of the present invention without departing from the concept of the present invention. The image feature extraction device provided by the present invention can be implemented by software, hardware, firmware, plug-in, and any combination thereof, and the present invention is not limited thereto.

In an exemplary embodiment of the present disclosure, a computer-readable storage medium is also provided, on which a computer program is stored, which when executed by, for example, a processor, may implement the steps of the image feature extraction method described in any one of the above embodiments. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the present invention as described in the image feature extraction method section above of this specification, when said program product is run on the terminal device.

Referring to fig. 5, a program product 400 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the tenant computing device, partly on the tenant device, as a stand-alone software package, partly on the tenant computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing devices may be connected to the tenant computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In an exemplary embodiment of the present disclosure, there is also provided an electronic device, which may include a processor, and a memory for storing executable instructions of the processor. Wherein the processor is configured to perform the steps of the image feature extraction method in any of the above embodiments via execution of the executable instructions.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.

Wherein the storage unit stores program code executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the image feature extraction method section above in this specification. For example, the processing unit 610 may perform the steps as shown in fig. 1 to 2.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a tenant to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above-mentioned image feature extraction method according to the embodiments of the present disclosure.

Compared with the prior art, the invention has the advantages that:

the method comprises the steps of expanding a characteristic pyramid of a plane in the BiFPN into a three-dimensional characteristic fusion mode, changing a planar characteristic pyramid network into two planes, wherein one plane is from top to bottom and then from bottom to top, the other plane is from bottom to top and then from bottom to top, then fusing the outputs of the two planes, and finally stacking the two planes to form a new pyramid network structure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. An image feature extraction method, characterized by comprising:

2. The image feature extraction method of claim 1, wherein the two-layer bidirectional feature pyramid neural network module comprises:

n input nodes;

3. The image feature extraction method according to claim 2, wherein, in the first intermediate layer:

4. The image feature extraction method according to claim 3, wherein, in the first intermediate layer:

one node X of the first set of nodes₁₁From a first input node IN₁And a second input node IN₂The fusion is obtained, another of the first set of nodesA node X₁₂From a second input node IN₂And another node X of the second group of nodes₂₂Obtaining fusion;

5. The image feature extraction method according to claim 4, wherein, in the second intermediate layer:

6. The image feature extraction method according to claim 5, wherein, in the second intermediate layer:

7. The image feature extraction method of any one of claims 1 to 6, wherein when N images of different sizes are input as a plurality of two-level bidirectional feature pyramid neural network modules, an output of a previous two-level bidirectional feature pyramid neural network module is input as a next two-level bidirectional feature pyramid neural network module.

8. An image feature extraction device characterized by comprising:

9. An electronic device, characterized in that the electronic device comprises:

a processor;

a storage medium having stored thereon a computer program which, when executed by the processor, performs the image feature extraction method according to any one of claims 1 to 7.

10. A storage medium having stored thereon a computer program which, when executed by a processor, performs the image feature extraction method according to any one of claims 1 to 7.