CN111382796A

CN111382796A - Image feature extraction method, device, equipment and storage medium

Info

Publication number: CN111382796A
Application number: CN202010158252.2A
Authority: CN
Inventors: 高玮; 张超; 胡浩; 杨超龙
Original assignee: Guangdong Bozhilin Robot Co Ltd
Current assignee: Guangdong Bozhilin Robot Co Ltd
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2020-07-07

Abstract

The embodiment of the invention discloses an image feature extraction method, device, equipment and storage medium. The method comprises the following steps: acquiring an image to be extracted; inputting the image to be extracted into a pre-trained target feature extraction model to obtain an output target feature image; the target feature extraction model comprises a first feature extraction model and a second feature extraction model, the first feature extraction model is obtained by training based on a first fusion image obtained by fusing an original building image and an original feature image corresponding to the original building image, and the second feature extraction model is obtained by training based on a second fusion image obtained by fusing the original training image and an output image of the first feature extraction model. According to the embodiment of the invention, the image after image fusion is used as the training image to train the target feature extraction model, so that the problem of labor cost in image feature extraction is solved, and the accuracy of image feature extraction is improved.

Description

Image feature extraction method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of image processing, in particular to an image feature extraction method, device, equipment and storage medium.

Background

Matting is a function often used in image processing, i.e. separating a target image in an image into individual images, and the image processing operation has high requirements on positioning and identifying the target image. At present, image data in the construction drawing is huge, great burden is brought to manual drawing verification, and working efficiency is low, so that automatic drawing verification is very important.

In the process of drawing verification, effective information extraction needs to be carried out on the construction drawings, manual screening is difficult to complete due to the fact that data are too large, and drawings with useless information removed need to be generated in batches by means of image processing. The core ideas in the prior art are basically based on a man-machine combination method, the identification scheme can achieve the purpose of intelligent identification to a certain extent, but certain defects exist, such as low identification precision and identification efficiency, repeated operation by a man-machine is needed, and the burden of personnel is heavy.

Disclosure of Invention

The embodiment of the invention provides an image feature extraction method, device and equipment and a storage medium, which are used for improving the accuracy and working efficiency of image feature extraction.

In a first aspect, an embodiment of the present invention provides an image feature extraction method, where the method includes:

acquiring an image to be extracted;

inputting the image to be extracted into a pre-trained target feature extraction model to obtain an output target feature image;

the target feature extraction model comprises a first feature extraction model and a second feature extraction model, the first feature extraction model is obtained by training based on a first fusion image obtained by fusing an original building image and an original feature image corresponding to the original building image, and the second feature extraction model is obtained by training based on a second fusion image obtained by fusing the original training image and an output image of the first feature extraction model.

In a second aspect, an embodiment of the present invention further provides an apparatus for extracting image features, where the apparatus includes:

the image to be extracted acquisition module is used for acquiring an image to be extracted;

the target characteristic image output module is used for inputting the image to be extracted into a target characteristic extraction model which is trained in advance to obtain an output target characteristic image;

In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement any of the image feature extraction methods referred to above.

In a fourth aspect, the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform any one of the above-mentioned image feature extraction methods.

According to the embodiment of the invention, the image after image fusion is used as the training image to train the first feature extraction model and the second feature extraction model, so that the problem of labor cost in image feature extraction is solved, and the precision and the working efficiency of image feature extraction are improved.

Drawings

Fig. 1 is a flowchart of an image feature extraction method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an original architectural image according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an original feature image according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of an output image of a first feature extraction model according to an embodiment of the present invention.

Fig. 5 is a flowchart of an image feature extraction method according to a second embodiment of the present invention.

Fig. 6 is a schematic diagram of a method for constructing a target feature extraction model according to a second embodiment of the present invention.

Fig. 7 is a schematic diagram of an image feature extraction device according to a third embodiment of the present invention.

Fig. 8 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of an image feature extraction method according to an embodiment of the present invention, where the embodiment is applicable to a case of extracting an original feature image in an input image, and the method may be executed by an image feature extraction device, which may be implemented in software and/or hardware, and may be configured in a terminal device. The method specifically comprises the following steps:

and S110, acquiring an image to be extracted.

The image to be extracted is an image to be input into a pre-trained target feature extraction model. Illustratively, the type of image to be extracted may be an architectural image. In an exemplary embodiment, acquiring an image to be extracted includes: receiving an image to be extracted input by a user and/or reading the image to be extracted stored in the equipment. The manner of acquiring the image to be extracted is not limited herein. Illustratively, the file format of the image to be extracted may be jpg, png or pdf format, and the file format of the image to be extracted is not limited herein.

In an embodiment, optionally, the image to be extracted is converted into a binary image, and the subsequent steps are performed according to the converted image to be extracted. The binary image means that the gray levels of the pixel points in the image are only two, that is, the gray value of any pixel point in the binary image is 0 or 255. The method has the advantages that the image feature information in the image to be extracted can be highlighted, and the feature extraction effect of the target feature extraction model on the image to be extracted is further improved.

And S120, inputting the image to be extracted into a pre-trained target feature extraction model to obtain an output target feature image.

The target feature extraction model comprises a first feature extraction model and a second feature extraction model, the first feature extraction model is obtained by training a first fusion image obtained by fusing an original building image and an original feature image corresponding to the original building image, and the second feature extraction model is obtained by training a second fusion image obtained by fusing the original building image and an output image of the first feature extraction model.

Wherein the original building image may be a technical image for characterizing the internal arrangement of the building. Illustratively, the building image may include at least one of building information of an external shape of a building, a finishing device, a structural configuration, a caption, a ruler, and the like. Fig. 2 is a schematic diagram of an original building image according to an embodiment of the present invention, in which fig. 2 shows floor plan information of a building, and fig. 2 shows building information of a building, such as a bearing wall, an air conditioning position, a door opening direction, and a window, for example. The dotted horizontal and vertical lines in fig. 2 may be used for coordinate reference for the installation position of interior finishing in a building. It should be noted that the schematic diagram of the original architectural image shown in fig. 2 is only used for illustrating the original architectural image, and is not limited to the original architectural image. The present embodiment does not limit the building information and the building illustrations in the original building image.

In one embodiment, optionally, the original building image is converted to a binary image. The method has the advantages that the image feature information in the original building image can be highlighted, and the feature extraction effect of the target feature extraction model is improved.

The original feature image corresponding to the original building image is an original feature image obtained by extracting a feature object on the basis of the original building image. For example, the type of the original feature image may be a binary image. In one embodiment, the original feature image includes feature objects. On this basis, the original feature image may further include connecting line information such as background lines. Fig. 3 is a schematic diagram of an original feature image according to an embodiment of the present invention, as shown in fig. 3, the original feature image includes a feature object that needs to be subjected to feature extraction in an original building image, and fig. 3 exemplarily illustrates the feature object included in the original feature image, such as a brick wall, an elevator, a fire hydrant, a bearing wall, a bay window, a bedroom side hung door, a bathroom side hung door, a window, a balcony sliding door, an air conditioner, a ladder, an electric well door, and the like. In an embodiment, optionally, an image segmentation method is used to segment the original building image to obtain an original feature image. Illustratively, the image segmentation method includes at least one of a gray threshold segmentation method, a clustering-based segmentation method, and a Mask R-CNN image instance segmentation method. In another embodiment, the original building image may be manually labeled to obtain an original feature image. It should be noted that the schematic diagram of the original feature image shown in fig. 3 is only used to illustrate the original feature image, and is not limited to the original feature image. The present embodiment does not limit the feature objects and the illustrations of the feature objects included in the original feature image.

In one embodiment, optionally, the first feature extraction model comprises a codec network model and the second feature extraction model comprises a full convolution network model. In one embodiment, optionally, the model parameters of the codec network model are adjusted based on the first loss function to obtain a trained first feature extraction model; and adjusting the model parameters of the full convolution network model based on the second loss function to obtain a trained second feature extraction model.

The method includes the steps of obtaining a first fusion image obtained by image fusion of an original building image and an original feature image corresponding to the original building image, wherein the first fusion image is used as an input image of a first feature extraction model, namely, the first fusion image is used as a training image of the first feature extraction model. The advantage of such an arrangement is that the feature information of the original building image and the position information of the feature information in the original building image are highlighted, thereby improving the efficiency and accuracy of feature extraction of the original building image by the first feature extraction model.

In one embodiment, optionally, the first fused image is input to a codec network model, and the encoder network performs feature extraction on the first fused image by using downsampling of a convolutional layer and a pooling layer to obtain an encoded image; the decoder network performs feature reconstruction on the encoded image output by the encoder network by using a convolution layer and transposed convolution upsampling to obtain an output image of the first feature extraction model. The pooling layer is a method for obtaining a feature image by sampling an input image in a partition manner, and may be, for example, a maximum pooling layer, where the maximum pooling layer is a result of pooling a maximum value in each region, so as to extract features of the image. In one embodiment, the decoder network structure is optionally less than the encoder network structure. Illustratively, the number of nodes of the decoder network structure is less than the number of nodes of the encoder network structure, or the number of layers of the decoder network structure is less than the number of layers of the encoder network structure. The advantage of this arrangement is that since the image data of the output image of the encoder network is less than the image data of the first fused image, the training process of the first feature extraction model can be accelerated while the feature reconstruction quality is ensured by reducing the decoder network structure.

Fig. 4 is a schematic diagram of an output image of a first feature extraction model according to an embodiment of the present invention. Compared with fig. 2, the predicted image output by the first feature extraction model shown in fig. 4 is the first target feature image obtained by removing the interference information in fig. 2. The interference information includes, but is not limited to, background lines, text descriptions, and interfering objects, wherein the interfering objects include non-feature objects. The first target feature image contains more global information, contains valid information elements, etc. Wherein the valid information element comprises a feature object. It should be noted that the schematic diagram of the output image of the first feature extraction model shown in fig. 4 is only an example of the output image of the first feature extraction model, and the output image of the first feature extraction model is not limited. For example, the line definition of the output image of the first feature extraction model may be the same as or different from the line definition of the original building image. The presentation effect of the output image of the first feature extraction model needs to be determined according to the training result in the actual operation process, and is not limited here.

In an embodiment, optionally, in the decoder network, the feature reconstruction is performed on the output image of the encoder by using a nearest neighbor interpolation algorithm, so as to obtain an output image of the first feature extraction model. In another embodiment, optionally, in the decoder network, a bilinear interpolation algorithm or a bicubic interpolation algorithm is used to perform feature reconstruction on the output image of the encoder, so as to obtain an output image of the first feature extraction model. The interpolation algorithm used is not particularly limited.

Wherein, the loss function can be used as a function for evaluating the difference degree between the predicted image output by the feature extraction model and the standard image. In one embodiment, the smaller the loss function, the better the robustness of the representation feature extraction model. The standard image refers to a target characteristic image meeting the requirement, that is, the standard image is an image from which all interference information is removed. Exemplary interference information includes, but is not limited to, written descriptions and non-characteristic objects, among others. The interference information is not limited here, and the interference information may be specifically limited according to actual operation requirements. In one embodiment, the loss function optionally includes, but is not limited to, a squared loss function, an exponential loss function, an absolute value loss function, a logarithmic loss function, and a combined loss function. Wherein, the combined loss function is a loss function formed by combining at least two loss functions. The first loss function and the second loss function may be the same or different, and the first loss function and the second loss function are not limited herein.

And performing image fusion on the original building image and the output image of the first feature extraction model to obtain a second fusion image serving as an input image of the second feature extraction model, namely, taking the second fusion image as a training image of the second feature extraction model. The full convolution network model adopted by the second feature extraction model is an optimizer network model. The advantage of such an arrangement is that after the output image of the first feature extraction model is input into the second feature extraction model, the output image of the second feature extraction model has obviously increased detail features compared with the output image of the first feature extraction model, and details, such as lines, in the output image of the first feature extraction model are highlighted, so that the output image of the second feature extraction model has sharper edges, and the precision and quality of the output image passing through the target feature extraction model are improved.

According to the technical scheme, the image after image fusion is used as the training image, the first feature extraction model and the second feature extraction model are trained, the problem of labor cost in image feature extraction is solved, and the accuracy and the working efficiency of image feature extraction are improved.

Example two

Fig. 5 is a flowchart of an image feature extraction method according to a second embodiment of the present invention, and the technical solution of this embodiment is further detailed based on the foregoing embodiment. Optionally, the method further includes: a first loss function is constructed based on the predicted loss function, the partial loss function, and the loss weight.

The specific implementation steps of this embodiment include:

s210, constructing a first loss function based on the prediction loss function, the partial loss function and the loss weight.

In one embodiment, optionally, the first loss function satisfies the following equation:

L_overall＝w₁L_α+(1-w₁)L_c

wherein L is_overallRepresents a first loss value, w₁Weight representing predictive loss function, L_αRepresenting the predicted loss value, L_cThe partial loss value is indicated. And the preset loss value and the partial loss value are obtained by calculation according to the prediction loss function and the partial loss function respectively.

In one embodiment, optionally, the predictive loss function is constructed based on alpha channel values of the output image of the first feature extraction model and the standard image. In one embodiment, the predictive loss function satisfies the following equation:

wherein the content of the first and second substances,

representing the predicted loss value at pixel point i,

the output image representing the first feature extraction model has alpha channel values at pixel point i,

representing the alpha channel value of the original building image at pixel point i, where,

the predicted loss value in the first loss function comprises a loss value obtained by adding the predicted loss values at each pixel point in the image. Wherein the image includes R, G, B and A four channels, wherein the R channel represents red, the G channel represents green, the B channel represents blue, and the A channel represents an alpha channel, describing a color space.

In one embodiment, optionally, the partial loss function is constructed based on RGB values of a fused image of the output image of the first feature extraction model and the original background image corresponding to the original architectural image, and RGB values of the original architectural image. In one embodiment, the partial loss function satisfies the following equation:

wherein the content of the first and second substances,

representing the partial loss value at pixel point i,

the fused image representing the output image of the first initial feature extraction model and the original background image corresponding to the original building image has RGB values at pixel point i,

representing the RGB values of the original building image at pixel point i, where,

the original background image comprises an image obtained by removing the characteristic object in the original building image on the basis of the original building image. In one embodiment, the original building image can be obtained by overlaying and fusing the original background image and the original feature image. The partial loss values in the first loss function include loss values obtained by adding partial loss values at each pixel point in the image. The partial loss function has the advantages that the neural network can be prevented from carrying out compound operation, the training process is accelerated, and the first feature extraction model outputs more accurate prediction images.

S220, adjusting model parameters of the coder-decoder network model based on the first loss function to obtain a trained first feature extraction model.

And when a first loss value calculated according to the prediction loss value and the partial loss value is converged, obtaining a trained first feature extraction model.

And S230, adjusting model parameters of the full convolution network model based on the second loss function to obtain a trained second feature extraction model.

In one embodiment, optionally, the second loss function is constructed based on an alpha channel value of a fused image of the output image of the second feature extraction model and the output image of the first feature extraction model, and an alpha channel value of the standard image. In one embodiment, the second loss function satisfies the following equation:

wherein the content of the first and second substances,

representing a second loss value at pixel point i,

the fused image representing the output image of the second feature extraction model and the output image of the first feature extraction model has an alpha channel value at pixel point i,

representing the alpha channel value of the standard image at pixel point i, where,

and adjusting the model parameters of the full convolution network model according to the calculated second loss value, and obtaining a trained second feature extraction model when the second loss value is converged.

Fig. 6 is a schematic diagram of a method for constructing a target feature extraction model according to a second embodiment of the present invention. And fusing the original building image and the original characteristic image corresponding to the original building image and inputting the fused image into a coding-decoder network model, wherein the coding network and the decoder network can be connected in a jump link mode. And adjusting model parameters of the coder-decoder network model according to a prediction loss value obtained by calculating an output image of the coder-decoder network model and a standard image, and adjusting the model parameters of the coder-decoder network model according to an image obtained by fusing the output image of the coder-decoder network model and an original background image corresponding to the original building image and a partial loss value obtained by calculating the original building image. And obtaining a first feature extraction model after the network model of the coder-decoder is converged. And fusing the output image of the first feature extraction model and the original building image, inputting the fused image into a full convolution network model, adjusting model parameters of the full convolution network model according to a prediction loss value obtained by calculation of the output image of the full convolution network model, the fused image of the output image of the first feature extraction model and a standard image, and obtaining a second feature extraction model after the full convolution network model is converged.

It should be noted that the schematic diagram shown in fig. 6 includes the original building image shown in fig. 2 and the original feature image shown in fig. 3. The effective element map is the original feature image described in the present embodiment, the real background image is the original background image described in the present embodiment, and the real image is the original architectural image described in the present embodiment. The images included in fig. 6 are only for illustrating the process of constructing the target feature extraction model, so as to more intuitively understand the process of constructing the target feature extraction model, and the images used in the process of constructing the target feature extraction model are not limited.

And S240, acquiring an image to be extracted.

And S250, inputting the image to be extracted into a pre-trained target feature extraction model to obtain an output target feature image, wherein the target feature extraction model comprises a first feature extraction model and a second feature extraction model.

According to the technical scheme of the embodiment, the target feature extraction model is trained by adopting the prediction loss function and the partial loss function, so that the problem of low image feature extraction precision is solved, and the precision and the quality of image feature extraction are improved.

EXAMPLE III

Fig. 7 is a schematic diagram of an image feature extraction device according to a third embodiment of the present invention. The embodiment is applicable to the case of extracting the original characteristic image from the input image, and the apparatus can be implemented in software and/or hardware, and the apparatus can be configured in a terminal device. The image feature extraction device includes: an image to be extracted acquisition module 310 and a target characteristic image output module 320.

The to-be-extracted image obtaining module 310 is configured to obtain an image to be extracted;

the target feature image output module 320 is configured to input an image to be extracted into a pre-trained target feature extraction model to obtain an output target feature image;

On the basis of the above technical solution, optionally, the apparatus further includes a target feature extraction model building module, specifically configured to:

adjusting model parameters of the coder-decoder network model based on the first loss function to obtain a trained first feature extraction model;

and adjusting the model parameters of the full convolution network model based on the second loss function to obtain a trained second feature extraction model.

Optionally, the apparatus further comprises:

and the characteristic reconstruction module is used for performing characteristic reconstruction on the output image of the encoder by adopting a nearest neighbor interpolation algorithm in the decoder network to obtain the output image of the first characteristic extraction model.

Optionally, the apparatus further comprises:

and the first loss function constructing module is used for constructing a first loss function based on the prediction loss function, the partial loss function and the loss weight.

Optionally, the first loss function constructing module includes:

and the prediction loss function construction unit is used for constructing a prediction loss function based on the alpha channel values of the output image of the first feature extraction model and the standard image.

Optionally, the first loss function constructing module includes:

and the partial loss function construction unit is used for constructing a partial loss function based on the RGB values of the fused image of the output image of the first feature extraction model and the original background image corresponding to the original building image and the RGB values of the original building image.

Optionally, the apparatus further comprises:

and the second loss function building module is used for building a second loss function based on the alpha channel value of the fused image of the output image of the second characteristic extraction model and the output image of the first characteristic extraction model and the alpha channel value of the standard image.

The image feature extraction device provided by the embodiment of the invention can be used for executing the image feature extraction method provided by the embodiment of the invention, and has corresponding functions and beneficial effects of the execution method.

It should be noted that, in the embodiment of the above image feature extraction apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

Example four

Fig. 8 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention, where the embodiment of the present invention provides a service for implementing the image feature extraction method according to the foregoing embodiment of the present invention, and the image feature extraction device according to the foregoing embodiment may be configured. FIG. 8 illustrates a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in fig. 8 is only an example and should not bring any limitation to the function and scope of use of the embodiments of the present invention.

As shown in FIG. 8, device 12 is in the form of a general purpose computing device. The components of device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, and commonly referred to as a "hard drive"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with device 12, and/or with any devices (e.g., network card, modem, etc.) that enable device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown in FIG. 8, the network adapter 20 communicates with the other modules of the device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, implementing an image feature extraction method provided by an embodiment of the present invention.

Through the equipment, the problem of labor cost in image feature extraction is solved, and the precision and the working efficiency of the image feature extraction are improved.

EXAMPLE five

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a method for extracting image features, and the method includes:

acquiring an image to be extracted;

inputting an image to be extracted into a pre-trained target feature extraction model to obtain an output target feature image;

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the above method operations, and may also perform related operations in the image feature extraction method provided by any embodiment of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An image feature extraction method is characterized by comprising the following steps:

acquiring an image to be extracted;

the target feature extraction model comprises a first feature extraction model and a second feature extraction model, the first feature extraction model is obtained by training based on a first fusion image obtained by fusing an original building image and an original feature image corresponding to the original building image, and the second feature extraction model is obtained by training based on a second fusion image obtained by fusing the original building image and an output image of the first feature extraction model.

2. The method of claim 1, further comprising:

3. The method of claim 2, further comprising:

in the decoder network, a nearest neighbor interpolation algorithm is adopted to carry out feature reconstruction on an output image of the encoder to obtain an output image of the first feature extraction model.

4. The method of claim 2, further comprising:

a first loss function is constructed based on the predicted loss function, the partial loss function, and the loss weight.

5. The method of claim 4, further comprising:

and constructing a prediction loss function based on the alpha channel values of the output image of the first feature extraction model and the standard image.

6. The method of claim 4, further comprising:

and constructing a partial loss function based on RGB values of a fused image of an output image of the first feature extraction model and an original background image corresponding to the original building image, and the RGB values of the original building image.

7. The method of claim 2, further comprising:

and constructing a second loss function based on the alpha channel value of the output image of the second feature extraction model, the alpha channel value of the fused image of the output image of the first feature extraction model and the alpha channel value of the standard image.

8. An image feature extraction device, comprising:

9. An apparatus, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of extracting image features of any of claims 1-7.

10. A storage medium containing computer-executable instructions for performing the method of image feature extraction according to any one of claims 1 to 7 when executed by a computer processor.